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METHODS AND COMPOSITIONS COMPRISING 
RENIllA GFP 



CROSS REFERENCE TO RELATED APPLICATIONS 

This application dalms the benefit of the flOng date of application United States Serial Number 
60/290.287 filed May 10. 2001 and of appllcatloi^ United States Serial No. 09/710.058. filed November 
10.2000. 

FIELD OF THE INVENTION 

The invention relates to methods and compositions utilizing Renilla green fluorescent proteins (K3FP) 
and POtosarcus green fluorescent proteins (pGFP). In particular, the invention relates to the use of 
rGFP or pGFP proteins as reporter for cell assays, partlcularty intracellular assays, Including methods 
of screening libraries using rGFP and pGFP. 

BACKGROUND OF THE INVENTION 

The field of biomolecule screening for biologlcally and therapeutically relevant compounds is rapidly 
growing. Relevant blomolecules that have been the focus of such screenings include chem.cal 
libraries, nucleic acid libraries, and peptide libraries in search for molecules that either inhibit or 
augment the biological activity of identified target molecules. With particular regard to peptide 
libraries, the isolation of peptide inhibitors of targets and the identification of fom«l binding partnera of 
targets has been a key focus. However, one particular problem w«h peptide libraries is the difficulty of 
assessing whether any particular peptide has been expressed, and at what level, prior to deteonming 
Whether the peptide has a biological effect. 

The green fluorescent protein from Aequorea victoria (hereinafter 'aGFP-) is a 238 amino add protein 
displaying autofluorescent properties. The crystal structure of tt^e protein and several point mutants 
has been solved (Omo, M. et al. (1996) Science 273: 1392-95; Yang. F. et al. (1996) Nature 
Biotechnol. 14: 1246-51). The fluorophore. consisting of a modified tripeptide. Is buried inside a 
relatively rigid structure, where It Is almost completely protected from solvent aocess. The 
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protein fluorescence is sensiUve to a number of point mutations (Piiillips. G.N. (1997) Curr. Opin. 
Struct. Biol. 7: 821-27). Since any disruption of the structure allowing solvent access to the 
fluorophoric tripeptide result in fluorescence quenching, the fluorescence appears to be a sensltiye 
indication of the preservation of the native stmcture of the protein. 

Uses of GFP as a biological marker, such as gene expression, protein targeting, protein Interactions, 
biosensors are well known. The extensively examined aGFP folds efffciently at or below room 
temperature, but falls to fold properly at higher temperatures. Aggregation of the protein appears to 
occur when overexpressed in certain organisms, resulting in weak fluorescence. In addition, the 
fluorescence of the native aGFP has a low quantum yield, which has prompted a search for variants of 
aGFP with improved stability and fluorescence properties. Although expression of aGFP is generally 
non-toxic to the cell In whksh It is expressed, there Is some suggestion that aGFP Is cytotoxic and may 
Induce apoptosis In expressing cells (Uu. H.S. et al. (1999) Biochem. Biophys. Res. Commun. . 260: 
712-17). Finally. aGFP has been used as a scaffold for peptkle display. However, some peptide 
insertions at the surface loops of aGFP result in low fluorescence, which suggests that aGFP may be 
sensitive to structural perturbations. 

In view of the physical and biological properties of aGFP. other fbmis of GFPs are desirable with 
fluorescence and stability characteristics different from aGFP. Green fluorescent proteins have been 
cloned from Ma reniformis (hereinafter "rrGFP"). Renilla muelleri (hereinafter "mnGFP"). and 
Pblosarcus gumeyl (hereinafter pGFP) (see WO 99/49019. hereby expressly Incorporated by 
reference). The core chromophore sequence of the iGFP and pGFPs Is different from aGFP, and the 
Renilla forms have fluorescence characteristics with higher molar absorbance coefficient and narrower 
absorption/emission spectra as compared to aGFP (Ward, W.W. et al. (1979) J. Biol. Chem. 254: 
781-88). The lack of significant homotogy to aGFP suggests that Renilla and Ptltosarcustorm 
provide important alternatives to the extensively exptolted aGFP. Accordingly, it is the object of the 
present invention to provide compositions and methods comprising rGFP and pGFP. 

SUIVIMARY OF THE INVENTION 

In accordance with the objects outlined above, the present Inventksn provides retroviral vectors 
comprising a promoter and a rGFP and/or a pGFP nucleic acid. Additional nucleic ackJ vectors 
embodied by this invention comprise a first gene of interest, a separation site, and a second gene of 
interest, wherein the first or second gene of interest Is a rGFP or pGFP gene. The separation site may 
be an IRES element, a Type 2A sequence, or a protease recognition sequence. The gene of interest 
may comprise reporter genes, selection genes. cDNAs. genomic DNAs. or random peptkJes. 

In a preferred embodiment, tiie rmGFP or pGFP used in the vectors are codon optimized for 
expression. That is. the rmGFP or pGFP are variants containing the preferred codons used In the 
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cells or organism in which the miGFP or pGFP are to be expressed. In a preferred ennbodiment, the 
rmGFP or pGFP is codon optimzed for expression In mammalian cells, most preferably in human 
cells. 

In another preferred embodiment, the present invention provides for fusions of a gene of interest and 
a gene encoding rmGFP or pGFP. The gene of Interest may comprise cDNA, genomic DNA, or a 
nucleic acid encoding a random peptide. In a preferred embodiment, the codons are optimized fbr 
expression as described above. 

In-a further preferred embodiment, the fusion nucleic acids comprise a library of fusion nucleic acids. 
That is, in one aspect, each member of the library may comprise a promoter, gene of Interest, a - 
separation sequence, and a second gene of interest, wherein the first or second gene of Interest 
comprises a iGFP or pGPF. In another aspect, the library may comprise fusions of a gene of interest 
and a gene encoding codon optimized rmGFP or pGFP. The present invention also provides for cells 
and libraries of cells comprising either these types of fusion nucleic acids. 

In a preferred embodiment, the present invention also provides fbr methods of screening for bioactive 
agents capable of altering a cell phenotype. The methods comprise contacting a cell, or a plurality of 
cells, comprising a fusion nucleic acid comprising a promoter and a codon optimized rmGFP or pGFP 
with at least one candidate agent, and screening the cells for an altered phenotype. Altemattvely. the 
cell comprises a fusion nucleic acid comprising a promoter. rGFP or pGFP, a separation sequence, 
and a gene of interest. 

In a preferred embodiment, the present Invention provides a method of screening for bioactive agents 
capable of Inhibiting or activating a promoter. The method of screening comprises first comblnlng.a 
candidate bioactive agent and a cell comprising a fusion nucleic acid comprising a promoter of interest 
and a nucleic acid encoding either rGFP or pGFP. then optionally indudng the promoter and detecting 
the presence of said rGFP or pGFP protein. In another aspect, the promoter is operably linked to a 
fusion nucleic acid comprising a rGFP or pGFP, a separation sequence, and a gene of interest The 
gene of interest may comprise a reporter gene, a selection gene, or a nucleic add encoding a 
dominant effect protein. 

In a further prefen-ed embodiment, the method comprises screening for agent inhibiting or activating 
an IL-4 Indudble e promoter. The method comprises first combining a candidate bioactive agent with 
a celi comprising a fusion nucleic add comprising an IL-4 Indudble e promoter operably linked to the 
fusion nudeic adds described above; inducing said promoter with IL-4; and then detecting the 
presence of said rGFP or pGFP protein. The absence of said rGFP or pGFP protein Indicates that 
said agent Inhibits said IL-4 indudble e promoter. 
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The methods of screening for pandidate agents altering a cell phenotype further comprises isolating 
the cell with the altered phenotype and Identifying the candidate agent responsible for producing the 
altered phenotype. 



Figure 1 shows an alignment of amino acid sequences of anthozoan GFPs with the Aequona 
sequence, using ClustalW program. The Renilla muellerf (RENM) and Ptilosarcus gumeyi (PTIL) 
sequences are shown below the Aequoria GPP (AEQV) sequence at the bottom. The italicized 
residues are the fluorescent tripeptide (chromophore). The sequences of tiie bottom four Anthozoan 
GFPs that emit light between 483-506 nm are from Matz, M. et al. (1999) Nature Biotech. 17: 969-973: 
ANEM, Anemonia majano GPP; DSFP. Dlscosoma striata GFP; FP48, Clavularia GFP; and ZFP5, 
Zoanthus GFP. The first 35 residues are removed from the amino terminus of FP48. A consensus 
residue was listed if at least 4 of the 7 residues were identical. Residues comprising turns and loops 
between the p^trands in ihe Aequoria GFP based on visual analysis 6f Aequoria GFP crystal 
structure (Yang, et al. (1996) Nature Biotechnd. 14: 1246-51) are underilned. The two residues on 
either side of the site of the inserted 22 mer peptide in the Renilla muelleri sequence are listed in bold 
type and designated as loops A-F in bold. The conesponding replacement sites in Aequoria GFP that 
allow formation of a fluorescent protein (Peelle, B. et al. (2001) Chem. Biol. 8: 521-34) are also shown 
in bold. 



Figure 2 compares the nucleic acid sequence of wild type (wt; lower sequence) Renilla muelleris GFP 
and the variant sequence codon optimized (co; upper sequence) for expression in human cells. In the 
codon optimized variant, 9 of the 239 amino adds are not optimized for prefened human codons in 
order to introduce restriction sites Into the coding sequences. The codon optimized sequence has a 
glycine inserted following the initiating methionine residue to provide further stability to the expressed 
nmGFP. 

Rgure 3 compares the nucleic acid sequence of wild type (wt; lower sequence) Ptilosarcus gumeyi 
GFP and a variant sequence codon optimized (co; upper sequence) for expression in human cells. 
Similar to the codon optimized Renilla muelleri variant, the optimized Ptilosarcus GPF has 1 1 of the 
239 amino adds not optimized for prefenBd human codons in order to introduce restriction sites into 
the coding sequences. As above, a glycine residue is inserted after the initiating methionine residue to 
provide stability to the expressed pGFP. 

Figure 4 shows the circular dichroism (CD) spectra dt Aequoria victoria, Renilla muelleri, and 
Riiosarcus gumeyi GFPs. CD spectres are taken at pH 7.5 in 10 mM potassium phosphate buffer 
with 0.1 M potassium fluoride and measured from 200 - 250 nm: EGFP (open cirdes). Renilla (open 
grey squares) and Ptilosarcus (filled squares) GFPs. Deconvolutlon of these spectra Indicates the 
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secondary structure content of all three GFPs to be identical. 

RguPB 5 shows the thermal denaturatfon curves for Aequoria victom, Renilla muelleri, and Ptilosarcus 
gumeyi GfPs as measured by CD. The most stable protein was Renilla GFP (open circles) with a T„ 
of 86.rc followed by EGFP (filled squares) with a r„ of 83.7'C and Ptilosarcus GFP (open triangles) 
withar„of80.5X. 

Rgure 6 gives the results of retroviral expression in human cells of human codon optimized Renilla 
muelleri, Ptilosarcus gumeyi, and Aequoria victoria GFPs. The retroviral constaicts were introduced 
into Jurkat-E cells and examined by flow cytometry. FACS plots of wild type (wt, RcDNA) and codon 
optimized Renilla muelleri GFP (R), Aequoria \rictoria GFP (E), and flag tagged versions of Ptilosarcus 
GFP (PQ, Renilla GFP (Rf) and Aequoria GFP (Ef) were obtained 4 days after infection. Both 
Ptilosarcus and Renilla GFPs have higher fluorescence intensities than Aequoria GFP. Uninfected 
cells are shown off scale due to shift of the dynamic mnge, ca. 2.6 log units to the left by FL1 
compensation on the cylometer. Geometric mean fluorescence values are listed In the upper right 
comer for each population within the gated region underilned. 

Figure 7 gives FACS analysis of Jurkat-E cell expression of Ren/7/a GFP with a 22mer HA epitope tag 
inserted into positions A to F. Plots in column A are shown with a standard fluorescence scale. For 
plots in Column B, FL-1 channel compensation was used to shift the fluorescence detection range, ca. 
2.6 log units, to the left to observe the high level of fluorescence. Renilla GFP is shown without insert 
(R), and with inserts in positions A, B, C, D, E, and F as labeled. Aequoria GFP is stiown without an 
insert (EGFP) and with the same insert in its equivalent position D (EGFP3). The sites of insertion, A- 
F are shown underiined in Figure 1. The constructs were retrovirally expressed In Jurkat-E cells and 
analyzed by FACS 4 days post-infection. The GFP geometric mean fluorescence values from the 
gated regions are listed in the upper right of each plot. F, and EGFP3 retain 3CM9% of their 
respective parent GFP fluorescence levels. B, C, and E had observable but much lower levels of 
fluorescence than the parent Renilla GFP. The position A insert has almost no measurable 
fluorescence above background. 

Figure 8 shows fluorescence micrographs of cells expressing fusion proteins comprising peptides 
inserted into sites D and F of Renilla muelleri GFP. The fusion proteins were retrovirally expressed in 
A549 cells. Expression of a fusion protein comprising a hemeaggiutinin epitope (HA) inserted into 
sites D and F Is shown in panels 1 and 2, respectively. Fluorescence occurs throughout the cell. 
Expression of NLS-GFP fusion protein, derived from SV40, inserted into sites D and F (panels 3 and 
4. respectively), results in fluorescence only In the nucleus. The results show that displayed NLS 
peptide is functional when presented as a peptide inserted onto a GFP scaffold and that the GFP 
molecule retains its fluorescence. 
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Figure 9 shows various Type 2A separation sequences useful In the present invention. These Type 
2A sequences are found in aplho- and cardioviral genomes. The general sequence is 
XXXXXXXXXXLXXDXEXNPGP, where X Is any amino acid. Invariant amino adds are shown In bold. 
Failure of peptide bond fonnation occurs at the Junction between the carboxy tenninal glycine and 
proline (underiined). The 2A sequence also shows a number of residues with consen/ed amino acid 
substitutions. 

DETAILED DESCRIPTION OF THE INVENTION 

The present invention is directed to the use of ReniJIa green fluorescent protein (hereinafter "rGFP"), In 
a variety of methods and compositions that exploit the autofluorescent properties of rGFP. These 
methods include, but are not limited to, the use of rGFP as a reporter molecule in cell screening 
assays, Including intracellular assays; the use of rGFP as a scaffold protein for fusions with random 
peptide libraries; etc. Similarly, compositions of rGFP are provided, including constmds of rGFP such 
as fusion constructs that Include rGFP as a reporter gene, retroviral constructs Including rGFP and 
separation sequences, etc. Basically, the invention provides a number of novel uses Ibr rGFP, similar 
to those outlined for aGFP in WO 95/07463, hereby Incorporated by reference In its entirety. In 
addition, the invention is also directed to the use of PVIosarcus green fluorescent protein, the amino 
acid sequence of which is shown in Figure 1 and Is also depicted In WO 99/4901 9. It should be noted 
that while the discussion below is generally directed to rGFP, pGFP may be used as well. 

In a prefenred embodiment, the Invention provides compositions Including rGFP. By "Renilla green 
fluorescent protein" or "rGFP" herein is meant a protein that has significant homology, as defined 
herein, to the wild-type Renilla renifomtis or Renllla muelleri protein of Figure 1 . both of which are 
described in WO 99/49019, hereby incorporated by reference in its entirety. 

in a prefenred embodiment, the Invention provides compositions including pGFP. By 'PUIosarcus 
green fluorescent protein" or "pGFP" herein is meant a protein that has significant homology, as 
defined herein, to the wild-type protein PUIosarcus protein of Figure 1 , as described in WO 99/4901 9, 
hereby incorporated by reference in Its entirety. 

A fGFP or pGFP protein of the present invention may be Identified in several ways. "Protein" in this 
sense includes proteins, polypeptides, and peptides. A nucleic add or rGFP protein Is Initially 
Identified by substantial nucleic add and/or amino add sequence homology to the sequences shown 
in Figures 1 and 2. Such homology can be based upon the overall nucleic add or amino add 
sequence. Similarly, a nudeic add or pGFP protein is also Initially identified by substantial nucleic 
add and/or amino acid sequence homology to the sequences shown in Figures 1 and 3. And again, 
such homology can be based upon overall nudeic add or amino add sequence. 
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As used herein, a protein Is a "rGFP protein" or "pGFP protein" if the overall homology of the protein 
sequence to the respective amino add sequences shewn in Figure 1 is preferably greater than about 
75%, more preferably greater than about 80%, even more preferably greater than about 85% and 
most preferably greater than 90%. In some embodiments the homology will be as high as about 93 to 
95 or 98%. 

Homology in this context means sequence similarity or identity, with identity being prefened. This 
homology will be determined using standard techniques known in the art, Including, but not limited to, 
the local homology algorithm of Smith and Watemian (1981) Adv. Appl. Math. 2:482, by the homology 
alignment algorithm of Needleman and Wunsch. (1 970) J. Mol. Biol. 48:443. by the search for 
similarity methiod of Pearson and LIpman (1988) Proc. Natl. Acad. Scl. USA 85:2444, by computerized 
Implementations of these algorithms (GAP, BESTFIT, FASTA, and TFASTA In the Wisconsin 
Genetics Software Package, Genetics Computer Group, 575 Science Drive, IVIadison, Wl), or the Best 
Fit sequence program described by Devereux. J. et al. (1984) Nucleic Adds Res. 12: 387-95. 
preferably using the default settings, or by inspection. 

In a preferred embodiment, similarity is calculated by FastDB based upon the following parameters: 
mismatch penalty of 1 .0; gap size penalty of 0.33; and joining penalty of 30.0 ("Cunent methods in 
Comparison and Analysis', Macromdecule Sequencing and Synthesis, selected methods and 
Applications, pp. 127-149. Alan R. Liss. Inc.. 1998). Another example of a useful algoritiim is PILEUP. 
PILEUP creates a multiple sequence alignment from a group of related sequences using progressive, 
palnvise alignments. It can also plot a tree showing ttie clustering relationships used to create the 
alignment PILEUP uses a simplification of ttie progressive alignment metfiod of Feng and Doollttle 
(1987) J. Mol Evol. 35: 351-60: the mettiod is similar to tiiat described by Higgins and Sharp (1989) 
CABIOS 5: 151-3. Useful PILEUP parameters Including a default gap weight of 3.00, a default gap 
length weight of 0.10. and weighted end gaps. 

An additional example of a useful algorittim is the BLAST algorithm, described in Altschul, S.F. et al. 
(1990) J. Mol. Biol. 215: 403-10 and Kariin, et al. (1993) Proc. Nati. Acad. Sci. USA 90: 5873-87. A 
particularty useful BLAST program is ttie WU-BLAST-2 program which was obtained from Altschul et 
al. (1996) Mettiods Enzymol. 266:460-80; http://blast.wustl/edu/blast/ README.html. WU-BLAST-2 
uses several search parameters, most of which are set to the default values. The adjustable 
parameters are set with tiie following- values: overiap span =1, overiap fraction = 0.125, and word 
tiireshold (T) = 1 1 . The HSP S and HSP S2 parameters are dynamic values and are established by 
tiie program itself depending upon the composition of the particular sequence and composition of the 
particular database against which tiie sequence of interest is being searched; however, the values 
may be adjusted to increase sensitivity. A % amino acid sequence identity value is detemilned by the 
number of matching identical residues divided by tiie total number of residues of the "longet* 
sequence in the aligned region. The "longer* sequence is ttie one having the most actual residues in 
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the aligned region (gaps Introduced by WU-Blast-2 to maximize the alignment score are ignored). 

In a similar manner, "percent (%) nucleic acid sequence identity" with respect to the coding sequence 
of the polypeptides identified herein Is defined as the percentage of nucleotide residues in a candidate 
sequence that are identical with the nucleotide residues in the coding sequence of the rGFP or pGFP 
proteins (Figure 1). A prefenred method utilizes the BLASTN module of WU-BLAST-2 set to the 
default parameters, with overlap span and overlap fraction set to 1 and 0.125, respectively. 

An additional useful algorithm is gapped BLAST as reported by Altschul, S.F. et al. (1997) Nucleic 
Acids Res. 25:3389^02. Gapped BU^ST uses BLOSUM-62 substitution scores; threshold T 
parameter set to 9; the two-hit method to trigger ungapped extensions; charges gap lengths of ^ a 
cost of 1 0+k\ Xu set to 1 6; and Xg set to 40 for database search stage and to 67 for the output stage of 
the algorithms. Gapped alignments are triggered by a score conBsponding to -^22 bits. 

The alignment may include the Introduction of gaps In the sequences to be aligned (see Figure 1). In 
addition, for sequences which contain either more or fewer amino acids tiian the protein sequences 
shown in Figure 1 , it is understood that the percentage of homology will be detennined based on the 
number of homologous amino acids in relation to the total number of amino acids. Thus, for example, 
homology of sequences shorter than that shown in Figure 1 , as discussed below, will be detemriined 
using the number of amino acids in the shorter sequence. 

The rGFP and pGFP proteins of the present invention may be shorter or longer than the amino acid 
sequences shown in Figure 1. Thus, in a preferred embodiment, included within the definition of rGFP 
and pGFP proteins are portions or fragments of the sequences depicted herein. Portions or 
fragments of rGFP or pGFP proteins are considered rGFP or pGFP proteins if a) they share at least 
one antigenic epitope; or b) inave at least the Indicated homology; c) preferably have rGFP or pGFP 
biological activity, e.g., Including, but not limited to, autofiuorescence; or d) fold into a stable stmcture 
that Is similar to the wild-type rGFP or p6FP stmcture. 

For example. rGFP or pGFP deletion mutants can be made. At the N-temninus, It Is known that only 
the first amino acid of the aGFP protein may be deleted without loss of fluorescence. At the C- 
tennlnus of the aGFP, up to 7 residues can be deleted without loss of fluorescence (see Phillips, G.N. 
et al. (1 997) Cunr. Opin. Struct. Biol. 7; 821-27). This presumably applies to rGFP and pGFP as well. 

In one embodiment, the rGFP or pGFP proteins are derivative or variant rGFP or pGFP proteins. That 
is, as outlined more fully below, the derivative rGFP or pGFP will contain at least one amino acid 
substitution, deletion or insertion, with amino acid substitutions being particulariy preferred. The amino 
acid substitution. Insertion or deletion may occur at any residue within the rGFP or pGFP protein. 
These variants ordinarily are prepared by site specific mutagenesis of nucleotides in the DNA 



8 



wo 02/090535 



/ 



PCTAJS02/14766 



encoding the GFP proteins, using cassette or PGR mutagenesis, DisIA shuffling mutagenesis, or other 
techniques well known in the art, to produce DNA encoding the variant, and thereafter expressing the 
DNA in recombinant cells as Is known In the art and outlined herein. However, variant rGFP or pGFP 
protein fragments having up to about 100-150 residues may be prepared by In vitro synthesis using 
established techniques. Amino acid sequence variants are characterized by the prsdetermined nature 
of the variation, a feature that sets them apart from naturally occurring allelic or Interspecies variation 
of the rGFP or pGFP protein amino acid sequence. The variants typfcally exhibit the same qualitative 
biological activity as the naturally occuning analogue, although variants can also be selected which 
have modified characteristics as will be more fully outlined below. That Is, in a prefened embodiment, 
when non-wlld-type rGFP or pGFP Is used, the derivative preferably has at least 1% of wild-type 
fluoresoence, with at least about 10% being prefenred. at least about 50-60% being particulariy 
preferred and 95% to 98% to 100% being especially preferred. In general, what is Important Is that 
there Is enough fluorescence to allow sorting and/or detection above background, for example using a 
fluorescence-activated cell sorter (FAGS) machine. However, in some embodiments, for example 
when fusion proteins with rGFP or pGFP are made. It is possible to detect the fusion proteins non- 
fluorescently using, for example, antibodies directed to either an epitope tag 0*e., purifk;ation 
sequence) or to the rGFP or pGFP itself. In this case, the rGFP or pGFP scaffold does not have to be 
fluorescent, if it can be shown that the tGFP or pGFP is folding conrectiy and/or reproducibly. 

Thus, the rGFP or pGFP may be wild type or variants thereof. These variants fall Into one or more of 
three classes: substitutional, Insertlonal or deletional variants. These variants ordinariiy are prepared 
by site specific mutagenesis of nucleotides in the DNA encoding the GFP, using cassette or PGR 
mutagenesis or other techniques well known In the art. to produce DNA encoding the variant, and 
thereafter expressing the DNA in recombinant cell culture as outflned herein. Howev^, variant protein 
fragments having up to about 100-150 rescues may be prepared by in vttro synthesis using 
established techniques. Amino acid sequence variants are characterized by the predetermined nature 
of the variation, a feature that sets them apart from naturally occurring allelic or interspecies variation 
of the rGFP or pGFP amino acid sequences. The variants typically exhibit the same qualitative 
biological activity as the naturally occum'ng analogue, although variants can also be selected which 
have modified characteristics as will be more fully outlined below. 

While the site or region for introducing an amino acid sequence variation is predetennined, the 
mutation per se need not be predetennined. For example, in order to optimize the performance of a 
mutation at a given site, random mutagenesis may be conducted at the target codon or region and the 
expressed scaffold variants screened for the optimal combination of desired activity. Techniques for 
making substitution mutations at predetermined sites in DNA having a known sequence are well 
known, for example. Ml 3 prinner mutagenesis and PGR mutagenesis. Screening of the mutants is 
done using assays of scaffold protein activities. 
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Amino add substitutions are typically of single residues; insertions usually will be on the order of from 
about 1 to 20 amino adds, although considerably larger insertions may be tolerated. Deletions range 
from about 1 to about 20 residues, although in some cases deletions may be much larger. 

Substitutions, deletions, Insertions or any combination thereof may be used to arrive at a final 
derivative. Generally tiiese changes are done on a few amino adds to minimize the alteration of the 
mdecule. However, larger changes nrmy be tolerated in certain drcumstances. When small 
alterations in the characteristics of the K3FP or pGFP protein are desired, substitutions are generally 
made in accordance with the following table: 

TABLE I 



Oriainal Residue 


Exemolan^ Substitutions 


Ala 


Ser 


Arg 


Lys 


Asn 


Gin. (His) 


Asp 


Glu 


Cys 


Ser 


Gin 


Asn 


Glu 


Asp 


Gly 


Pro 


His 


Tyr, (Asn), (Gin) 


lie 


Leu. Val 


Leu 


lie. Val 


Lys 


Arg, (Gin), (Glu) 


Met 


Leu lie 


Phe 


Tyr, Trp, (Met), (Leu) 


Ser 


Thr 


Thr 


Ser 


Trp 


Tyr. Phe 


Tyr 


Trp. Phe 


Val 


lie. Leu 



Less favored substitutions are given In parentheses. Substantial changes in function or immunological 
identity are made by selecting substitutions that are less conservative than those shown in Chart I. 
For example, substitutions may be made that more signiflcantiy affed the structure of the pdypepjtide 
baclcbone In the area of the alteration of the alpha-helical or beta-sheet strudure, the charge or 
hydrophobidty of tine molecule at ttie target site, or the bulk of the side chain. In general, ttie 
substitutions expected to produce the greatest changes In tiie polypeptide's properties are those in 
which (a) a hydrophllic residue, e.g.. seryl or tiireonyl. Is substituted for (or by) a hydrophobic residue 
(e.g., leucyl, isoleucyl, phenylalanyl. valyl or alanyl); (b) a cysteine or proline Is substituted for (or by) 
any ottier residue; (c) a residue having an eledroposltlve side chain (e.g., lysyl, arginyl. or histidyl) is 
substituted for (or by) an electronegative residue (e.g., glutamyl or aspartyl); or (d) a residue havlrig a 
bulky side chain (e.g.. phenylalanine) is substituted for (or by) one not having a side chain (i.e., 
glydne). 

As outiined above, tiie variants typically exhibit the same qualitative biological activity (I.e.. 
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fluorescence) although variants also are selected to modify the characteristics of the rGFP or pGFP 
protein as needed. 

In a preferred embodrment, specific residues of rGFP or pGFP protein are substituted, resulting in 
proteins with modified characteristics. Such substitutions may occur at one or more residues, with 1- 
1 0 substitutions being prefenred. PrefenBd characteristics to be modified include range of spectral 
emission. Including shifts in excitation spectrum, emission spectrum, rate of folding, stability, solubility, 
expression levels, toxicity, sensitivity to ions halide ions, and emission intensity. As is icnown in the 
art, there are a number of aGFP variants with desirable properties, and these may be varied in the 
conesponding rGFP and pGFP amino add residues. 

In a preferred embodiment, residue 46 of rmGFP, pGFP, and residue 43 of nrGFP (comesponding to 
residue 43 of aGFP) is substituted with a Thr or an Ala. 

In a prefenred embodiment, residue 68 of mn GFP, pGFP and residue 65 of rrGFP (corresponding to 
residue 64 of aGFP) is substituted with an Leu or Val. 

In a preferred embodiment, residue 69 of rmGFP, pGFP, and residue 66 of rrGFP (connesponding to 
residue 65 of aGFP) is substituted with an Thr, lie, Cys, Ser, Leu, Ala or Gly. 

In a preferred embodiment, residue 70 mnGFP, pGFP , and residue 67 of rrGFP (connesponding to 
residue 66 of aGFP) is substituted with a His, Phe, or Trp. 

In a prefenBd embodiment, residue 72 of miGFP, pGFP, and residue 69 of rrGFP (conesponding to 
residue 68 of aGFP) is substituted with a Val or Leu. 

> 

In a preferred embodiment, residue 76 of rmGFP, pGFP, and residue 73 of rrGFP (corresponding to 
residue 72 of aGFP) Is substituted with an Ser or Ala. 

In a preferred embodiment, residue 101 of miGFP, pGFP, and residue 98 of nrGFP (oonesponding to 
residue 99 of aGFP) is substituted with an Phe or Ser. 

In a preferred embodiment, residue 125 of nriGFP and pGFP, and residue 124 of rrGFP 
(corresponding to residue 123 of aGFP) Is substituted with an lie. 

In a prefenred embodiment, residue 147 rmGFP and pGFP, and residue 146 of n<3FP (conresponding 
to residue 145 of aGFP) is substituted with a Tyr, Phe or His. 

In a preferred embodiment, residue 148 of rGFP and pGFP, and residue 147 of rrGFP (corresponding 
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to residue 146 of aGFP) is substituted with an Asn or lie. 

In a preferred embodiment, residue 150 of rmGFP and pGFP, and residue 149 of nrGFP 
(corresponding to residue 148 of aGFP) is substituted with an His or Arg. 

In a prefenred embodiment, residue 155 of rGFP and pGFP, and residue 154 of rrGFP (corresponding 
to residue 153 of aGFP) fs substituted with a Thr or Ala. 

In a pnefenred embodiment, residue 162 of nriGFP and pGFP, and residue 161 of n<3FP 
(corresponding to residue 163 of aGFP) Is substituted witii an Val or Ala. 

In a prefemed embodiment, residue 166 of rmGFP and pGFP. and residue 165 of nrGFP 
(connespondlng to residue 167 of aGFP) is substituted with an lie or Thr. 

In a prefemed embodiment, residue 174 of miGFP and pGFP, and residue 173 of rrGFP 
(corresponding to residue 175 of aGFP). 

In a prefenBd embodiment, residue 200 of mriGFP and pGFP, residue 199 of nOFP (corresponding to 
residue 202 of aGFP) is substituted with an Ser or Phe. 

In a prefen-ed embodiment residue 201 of nmGFP and pGFP, and residue 200 of rrGFP 
(corresponding to residue 203 of aGFP) is substituted witii an He, Thr, or Tyr. 

In a prefen-ed embodiment, residue 203 of miGFP and pGFP, and residue 202 of rrGFP 
(corresponding to residue 205 of aGFP) is substituted witii an Ser or Thr. 

In a preferred embodiment, residue 210 of nmGFP and pGFP, and residue 209 of rrGFP 
(corresponding to residue 212 of aGFP) Is substituted with an Asn or Val. 

In a prefen-ed embodiment, residue 218 of miGFP and pGFP, and residue 216 of rrGFP 
(conBsponding to residue 222 of aGFP) is substituted with a Gly or Ser. 

In addition, rGFP or pGFP proteins can be made tiiat are longer than the wild-type, for example, by 
the addition of epitope or purification tags, the addition of otiier fusion sequences, etc., as is more fully 
outiined below. 

In another prefen-ed embodiment. GFP variants as used hierein Include GFPs containing codons 
replaced with degenerate codons coding for the same amino acid. This arises frem the degeneracy of 
tiie genetic code where ttie same amino acids are encoded by alternative codons. IReplacing one 
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codon with another degenerate oodon changes the nucleotide sequence without changing the amino 
acid residue. An extremely large numl)er of nucleic adds may be made, all of which encode the GFPs 
of the present invention. Thus, having identified a particular amino acid sequence, those skilled in the 
art could mal<e any number of different nucleic acids, by simply modifying the sequence of one or 
more codons in a way which does not change the amino acid sequence of the protein. In this regard, 
the present invention has specifically contemplated each and every possible variation of 
polynucleotides that could be made by selecting combinations based on the possible codon choices, 
and all such variations are to be considered specifically disclosed and equivalent to the sequences of 
Figure 1 . It also should be noted that codon optimization that results in one or small number of amino 
acid changes, partlculariy, conservative changes are also possible. 

Changing the codons may be desirable for a variety of situations. For example, substitutions with a 
degenerate codon Is useful when eliminating cryptic splice signals present in the coding regions of a 
gene, inserting restriction sites In the gene, distinguishing between one version of the same gene from 
another {e.g., by hybridization), creating alternative primers for amplification reactions, examining 
mutational bias in genes, changing chromosomal methylation patterns (e.g., for detenmlning 
preferential parental transmission), and changing the expression levels of the gene of interest. 

AccoixJingly, in a further prefen^d embodiment, the GFP variants are oodon optimized for expression 
In a particular organism. By "codon optimized" herein is meant changes in tiie codons of the gene of 
Interest to those preferentially used in a particular organism such ttiat ttie gene Is effidentiy expressed 
In the organism. Although the genetic code Is degenerate In tfiat most amino acids are represented by 
several codons, called synonyms or synonymous codons, it is well known that codon usage by 
particular organisms is nonrandom and biased towards particular codon triplets. This codon usage 
bias may be higher In reference to a given gene, genes of common function or ancestral origin, highly 
expressed proteins versus low copy number proteins, and the aggregate protein coding regions of an 
organism's genome. Altiiough codon bias may arise from nucleotide composition or mutational biases 
in different organisms, codon usage bias in bacteria and yeast comelates with the abundance of tRIMA 
species in the cell. In general, oodon bias is often associated with the level of gene expression. That 
is, certain codons are preferentially represerited in ttie protein coding regions of highly expressed gene 
products. Thus, changing the codons to the preferred codons of a particular organism may allow 
higher level expression of tiie encoded protein in tiiat organism. In this regard, the present invention 
relates to GFP variants whose codons are altered to the preferred codons of the organism in which the 
gene of Interest is being expressed. In otiier words, codons are preferably selected to fit the host cell 
In which thd protein Is being produced. For example, prefen^ed codons used in bacteria are used to 
express tiie gene in bacteria; preferred codons used In yeast are used for expression In yeast; and 
prefemed codons used in mammals cells are used for expression in mammalian cells. 

By "preferred", "optimar or "favored" codons. or "high codon usage bias" or grammatical equivalents 
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as used herein is meant codons used at higher fifBquency in the protein coding regions than other 
codons that code for the same amino acid. The prefenred codons may be detennined in relation to 

codon usage in a single gene, a set of genes of common function or origin, highly expressed genes, 
the codon frequency in the aggregate protein coding regions of the whole organism, codon finequency 
In the aggregate protein coding regions of related organisms, or combinations thereof. 

In a preferred embodiment, preferred or^vored codons are detennined for genes of common 
function, while in a more prefenred embodiment, prefen-ed codons are determined for protein coding 
regions of the whole organism or related organisnns. In a most preferred embodiment, codon usage in 
arepresentative number of highly expressed gene products of an organism or related organisms will 
provide ttie basis for detemrilnlng the set of prefenred codons. ThuSi in one aspect, prefenred codons 
are those codons whose frequency increases with the level of gene expression. Since gene 
expression may be restricted to specific ceils or certain developmental time periods (e.g., embryonic 
and adult), whether a gene is highly expressed is measured in respect to the cells and the temporal 
periods when the gene is expressed. 

In another aspect, preferred codons are further delineated with respect to the size of the protein 
coding regions examined. Studies of codon bias show a negative coneiati'on between the size of the 
protein and codon usage (see Duret, L. et al. (1999) Proc. Natl. Acad. Sd. USA 96: 4482-87). For 
proteins of increasing length, there is a tendency for less codon usage bias while highly expressed 
proteins of decreasing length display increased codon usage bias. Thus, in a preferred embodiment, 
the size of proteins used for assessing preferred codons includes proteins of all lengths, while a more 
prefen'ed embodiment uses protein lengths up to about 550 amino acids. In the most preferred 
embodiment, proteins lengths of up to about 335 amino acids are used. 

A variety of methods are known for detemriining the codon frequency (e.g., codon usage, relative 
synonymous codon usage) and codon preference in specific organisms, including muitivariat analysis, 
for example, using cluster analysis or conBspondence analysis, and the effective number of codons 
used in a gene (see GCG CodonPreference, Genetics Computer Group Wisconsin Package; 
CodonW, John Peden, University of Nottingham: Mclnemey, J.O (1998) Bioinfomriatics 14: 372-73; 
Stenico. M. et al. (1994) Nucleic Acids Res. 222437-46; Wright, F. (1990) Gene 87: 23-29). Codon 
usage tables are available for a growing list of organisms (see for example, Wada, K. et al. (1992) 
Nucleic AckJs Res. 20: 21 1 1-21 18; Nakamura, Y. et al. (2000) Nud. Ackls Res. 28: 292; Duret, et al. 
supra). The data source for obtaining codon usage may rely on any available nucleotide sequence 
capable of coding for a protein. These data sets Include nucleic acid sequences actually known to 
encode expressed proteins (e.g.. complete protein coding sequences-CDS), expressed sequence 
tags (ESTS). or predicted coding regions of genomic sequences (see for example. Mount, D. 
Bioinlbrmatics: Sequence and Genome Analysis . Chapter 8, Cold Spring Hart>or Laboratory Press, 
Cold Spring Harbor. New Yoik, 2001 ; Uberbacher, E.C. (1 996) Methods Enzymol. 266: 259-281 ; 
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Tiwari, S. et al. (1997) Comput AppL Bloscf. 13 263-270). Accordingly, the present invention relates 
to codon optimization for enhancing expression of a gene in any host organism. 

In a prefened embodiment, the nucleotide sequence of rGFP or pGFP are substituted with codons 
preferentially used in the organism in which the GFP is to be expressed, in identifying the codons for 
modification or replacement, tiie codons of rGFP or pGFP (or any other protein coding region) are 
compared to tfie codons favored or preferred in the organism of interest. This analysis identifies 
differences between the prefen-ed set of codons and the codons actually used, and thus identifies 
nucleotides for substitutions. In a prefen*ed embodiment, codons in rGFP or pGFP that are the least 
preferred codons in the subject organism are selected for substitution. Further substitutions are made 
for frequently occurring codons in rGFP or pGFP that are not the prefened codons. Although the 
firequently occurring codons may not comprise the least prefened codons, presence of numerous non- 
prefened or optimal codons can limit efficient expression of the protein product 

When several preferred codons are available for the same amino acid, the choice of substitution can 
rely on other considerations such ease of constructing the variant, concems for limiting introduction of 
mutations during propagation of the gene in the host organism (i.e., mutational bias), secondary 
structure of the mRNA that may affect expression levels, and concem for generating splice sites. 
Other considerations may taice into account the intended uses of the codon optimized variants, such 
as insertion of restriction sites for generating fusion proteins. Thus, some deviations from strict 
adherence to prefenred codons are pemnissible to accommodate restriction sites in the resulting gene 
for the purposes of constructing the variant, replacement of gene segments (e.g., to simplify insertion 
of mutated gene segments), and for creating fusion proteins, as described below. 

In certain embodiments, all codons need not be replaced to optimize the codori usage of tiie GFP 
since the natural sequence will comprise tiie prefened codons and because use of preferred codons 
may not be required for all amino acid residues. In one aspect, about 10 to about 35% of the codons 
are replaced or changed. Additional changes may be introduced to maximize expression. 
Consequentiy, codon optimized GFP sequences may contain preferred codons at about 40%, 50%, 
60%, 70%, 80%, or greater than 90% of codon positions of the full length coding region. 

Preferred genes of interest are codon optimized for prokaryotes or eulcaryotes. Prokaryotes may 
comprise, among others, bacteria, including Bacillus (for example, subtills, anthracis), Clostridia, 
Staphyloooccus, Streptococcus, Neisseria, Erysfpetothrix, L^teria, htocardia, Salmonella, Shigella, 
Escherichia, Klebsiella, Enterobacter, Serratia, Proteus, Morganella, Provldencia, Yersinia, 
Haemophilus, Brucella, Franclsella, Vibrio, Pseudomonas, Campylobacter, Clostridium, Actinomyces, 
Corynebacterium, Bacteroides, Mycobacterium (for example, tuberculosis, leprae); spirochetes, 
including Trepanoma, Borrella, Leptospira, and Spirillum; archebacteria, including Methanobacterium, 
Thermoplasma, Thermophllus, or otiier thenmophlles (e.g., Sutfolobus), and Halobacterium', and 
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cyanobacteria. Eukaryotes may comprise, among others, protlsts, including Mastigophora, Sarcodina, 
Ciliophore, and Spomzoa (trypanosoma); fungi, including Saccharomyces, Schfzosaccharomyces, 
Candida, Neurospora, Aspergillus, Ustllago, Penlcillium, and Sordaria\ plants, Including Chlorophyta 
and Tracheopbyta - Anglosperms and Spermopslda (e.g., tobacco, arabldopsis, com, rice, wheat, 
tomato, potato, etc.); worms, Including nematoda (e.g., Caenorfiabditis, Trlchinella, Trfchuris), 
platyhelminthes (e.g., Dlphyllobothrium, Cbnorchls, and Dugesia (e.g.. planaria); insects, including 
Dmsophlla, Manduca, Bombyx etc.; amphibia (e.g., Xenopus, newts, salamanders etc.); fish (e.g., 
salmon, catfish, zebrafish, Xiphophorus, trout, goldfish, tllapia and medaka etc.); aves (e.g., turkey, 
cliicken, ducl<, quail, and geese, etc.); mammalia, including rodentia (e.g., mice, rats, gerblls, 
hamsters, etc.); legomorpha (e.g.. rabbits, hares), artiodactyla (e.g., cows, pigs, sheep, goats, etc.), 
canis (e.g., domestic dog), felis (e.g., domestic cat), and primates (e.g., monl^eys, chimpanzees, and 
humans). Codon optimization for expression In bacteria, yeast, mammalian cells (e.g., rodents, 
primates etc.). and In particular human cell types are most prefierred. 

Codon preference in the coding regions of human genes is given in Table II. The table shows the 
relative firequency of each codon among synonynrK}us codons. The most prefenBd codons are given 
in bold. Methionine and tryptophane have a value of 1 since these residues are encoded by a single 
codon. For certain amino acids, such as arg, four synonymous codons are used at similar 
frequencies. 



TABLE II 



TTT phe P 0.43 
TTC phe F 0.57 
TTA leu L 0.06 
TTG leu L 0.12 


TCT ser S 0.18 

TCC ser S 0.23 
TCA ser S 0.15 
TCG ser S 0,06 


TAT tyr y 0.42 
TAG tyr Y 0.58 

TTA och Z ^- 

TAG amb Z 


TGT cys C 0.42 

TGC cys C 0.58 
TGA opa Z — 
TGG trp W 1.00 


CTT leu L 0.12 
CTC leu L 0.20 
CTA leu L 0.07 
CTG leu L 0.43 


CCT pro P 0.29 
CCC pro P 0.33 

CCA pro P 0.27 
CCG pro P 0.11 


CAT his H 0.41 
CAC his H 0.59 

CAA gin Q 0.27 
CAG gin Q 0.73 


CGT arg R 0.09 
CGC arg R 0.19 
CGA arg R 0.10 
CGG arg R 0.19 


ATT ile I 0.35 
ATC ile I 0.52 
ATA ile I 0.14 
AtQ met M 1.00 


ACT thr T 0.23 
ACC thr T 0.38 
ACA thr T 0.27 
ACG thr T 0.12 


AAT asn N 0.44 
AAC asn N 0.56 
AAA lys K 0.40 
AAG lys K 0.60 


AGT ser S 0.14 
AGC ser S 0.25 
AGA arg R 0.21 
AGG arg R 0.22 


GTT val V 0.17 
GTC val V 0.25 
GTA val V 0.10 
OTQ val V 0.48 


GCT ala A 0.28 
GCC ala A 0.40 
GCA ala A 0.22 
GCG ala A 0.10 


GAT asp D 0.44 
6AC asp D 0.56 
GAA glu E 0.41 
GAG glu E 0.59 


GGT gly G 0.18 
GOG gly G 0.33 
GGA gly G 0.26 
GGG gly G 0.23 
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The Godon optimized GFPs are made in accordance with methods well known in the art. When the 
substitutions or replacements are not extensive, oligonucleotide directed mutagenesis or other 
localized mutagenesis techniques, such as replacing fragments of the gene with fragments containing 
the preferred codons, are used to optimize the codons. If codon optimization Is extensive, the GFP 
gene may be a synthetic gene generated from overlapping oligonucleotides (Jayaraman, K. et al. 
(1991) Proc. Natl. Acad. Sd. USA 88: 4084-8; Stammer, W.P. et al. (1995) Gene 164: 49-53). The 
oligonucleotides may or may not be llgated together during the process for generating the synthetic 
gene. In this regard, use of polymerase chain reaction of the hybridized overlapping oligonucleotides 
allows facile generation of these synthetic genes. 

in accordance with the present invention, exemplary codon optimized variants for expression In 
human cells is provided by SEQ ID N0:1 for Renilla mueUeri GFP and SEQ ID NO: 2 for Ptilosarcus 
gumeyl GFP (Figures 2 and 3, respectively). In the codon optimized rmGFP, 9 of the 239 amino acids 
are not the preferred human codons In order to accommodate restriction sites used for constructing 
various miGFP fusion proteins. For the codon optimized pGFP, 1 1 of the 239 amino acids are not the 
prefenied codons for the same reasons given above. It will be understood, however, that the codon 
optimized sequences of the present invention are by no means limited to the representative sequence 
provided herein. In view of the preceding discussion, one of skill In the art will readily be able to 
prepare a number of different codon optimized GFP sequences for expression In a given organism, 
especially for expression in human cells, or other cells as outlined herein. 

In a prefen-ed embodiment, the rGFP or pGFP protein, including variants is fused to a protein of 
Interest, including peptides as outlined herein. By "fused* or "operably linked" herein is meant that the 
peptide, as defined below, and the rGFP or pGFP protein are linked together. In a preferred 
embodiment, fusion nucleic acids are made such that fusion polypeptides, e.g. a single polypeptide, 
are made. In an altemative embodiment, fusion nucleic acids comprising separation sites (e.g., 
protease recognition sequences, 2A sequences, or IRES sequences) are made as further described 
below. In one preferred embodiment, the fusions dismpts the fluorescence characteristic of the rGFP 
or pGFP. That is, the fluorescence characteristics of the rGFP or pGFP Is changed, including under 
different solution conditions (e.g, temperature, pH, bn concentration, haiide concentration, membrane 
potential, etc.). In another preferred embodiment, the fusions only minimally disrupts stability of rGFP 
or pGFP. That Is, the rGFP or pGFP preferably retains its fluorescence, or maintains a T„ (thennal 
melting temperature) of at least 42'C. 

In a preferred embodiment, the present invention Is also useful in marking viruses and cells and as 
reporters for cell proliferation, as further illustrated below. General expression or specific regulated 
expression of the fusion proteins mari<s the cell, either constitutively or at specific periods in 
development These mari<ed viruses and cells may be detected and tracked to determine their 
migration or proliferation in a organism or In response to specific biological signals, for example 
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cytokines and chemoklnes. As further described k)elow, these cells may be used in screens to Identify 
candidate agents that alter the Infectivlty, migration or proliferation of these vlmses or cells in 
response to the biological signals. 

In a preferred embodiment, the fusions to rGFP or pGFP are used for tracking or k)calizing the protein 
to a particular subcellular location; quantitating gene expression; display of peptides; indicator of 
cellular reactions; markers for cell growth and proliferation, etc. The fusions may be made to any 
protein of interest encoded by any gene of Interest These Include genomic DNA, cDNA, protein^, 
interaction domains, targeting sequences (e.g., localization sequences), stability sequences, protein- 
modification sequences (e.g.. phosporylation, ADP ribosylatlon, lipldatlon, gtycosytatlon, protease 
sites," etc.), random peptides, biosensor sequences, as further discussed below. The fusions may be 
made to the amino terminal, the cart)oxy temninal, or internally to the GFP sequence. When the 
fusions are Internal to the rGFP or pGFP, they are preferably In the intemal loops of the fluorescent 
proteins. I n a preferred embodiment, the fustons do not affect the fluorescence, which aibws direct . 
detection of the fusion protein. In another aspect, detecting the fusion protein uses a label that binds 
the fusion protein, such as a labeled antibody directed against r- or pGFP or the fused gene of 
Interest, in which case the fusion protein need not be fluorescent As outlined below, the fusion 
polypeptide (or fusion polynucleotide encoding the fusion polypeptide) can comprise additional 
components. Including multiple peptides at multiple loops, fusion partners, linkers, etc. 

In a preferred embodiment, the fusion to rGFP or pGFP, preferably a codon optimized variant, are 
used to track and localize proteins intracellular or extixicellularly. Fusion may be made to any protein 
of interest to examine cellular processing events of the subject protein. Proteins of interest Include 
cytoskeletal proteins for tracking cell rTX>vement and cell staicture; focal adhesion proteins involved in 
cell adherence; nuclear proteins for examining signals involved In nuclear transport; nuclear 
membrane proteins involved in nuclear membrane dissolution and refomnation; cell organelle 
replication and structure; Intracellular transport of proteins (e.g. targeting signals); development of 
stoictural polarity in cells (e.g., neuronal or epithelial cells); monitoring celt division processes; and ttie 
like. Many of tiiese aforementioned process are abnomnal in disease cells, such as cancer cells. 
These fusion proteins expressed In cells are useful for Identifying candidate agents that affect these 
biological processes In particular cell types. Thus, screens may be conducted for agents tiiat confer a 
phenotype similar to a disease cell or for agents tiiat convert an abnormal cell, characterized by an 
abnomnal cellular process, to a normal cell. 

In another preferred emtK)diment, ttie fusions are made to protein-modification sequences. These 
sequences may be a sequence capable of being modified by any modification process. In a prefen-ed 
embodiment, ttie modification sequence Is a modified by anotiier protein or enzyme. In one aspect, 
the modification sequence comprises a phosphorylation sequence (Yang, F. et al. (1997) Anal. 
Blochem. 266: 167-73; . A variety of phosphorylation sequences are known (e.g., src homology 
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domain SH2 and SH3) and recognized by kinases that attach phosphates to specific amino acids 
(e.g., serine, threonine, tyrosine, histidine) (see Kreegiopuu, A. et ai. (1999) Nucleic Acids Res. 27: 
237-39). The phosphorylation sequences are fused to GFP to allow correct presentation presentation 
to the cognate kinases. Phosphorylation of the sequence may or may not affect the fluorescence 
properties of rGFP or pGFP. By "fluorescence properties" herein is meant any detectable change in 
the fluorescence characteristic of the GFP. TTiis may involve the molar extinction coefficient at the 
appropriate excitation wavelength, fluorescence quantum yield, excitation and emissbn spectra, ratio 
of excitation amplitudes at two different wavelengths, ratio of emission amplitudes at two different 
wavelengths, excitation lifetime, and fluorescence quenching. 

In one preferred embodiment, phosphorylatJon of the fuston proteins does not affect the fluorescence 
characteristic. In this context, the GFP provides a scaffoki for efficient presentation of the sequence 
as a substrate for a kinase. The phosphorylation is detected by direct labeling with labeled nucleotide 
substrate (e.g., ATP) or reaction with antibodies specific for phosphorylated sequences. In another 
prefened embodiment, phosphorylation of the fusk>n protein changes the fluorescence characteristics 
of the fluorescent protein such that the change provides an Indication of kinase activity (see U«S. Pat. 
No. 6,248,550, expressly incorporated by reference). Generally, the kinase substrate fusion protein 
displays distinguishable properties between the phosphorylated and unphosphorylated states. 
Measuring the change In fluorescent characteristic before and after contacting with the kinase 
provides a measure of kinase activity. 

in another preferred embodiment, the translocation from one cellular location, or the ability to interact 
with a phosphoprotein binding domain, provides another measure of phosphorylation, it is well known 
that phosphorylation of specific sequences alters the interaction of the sequence with a cognate 
binding partner. Phosphorylation may prevent or enhance these interactions. Thus, phoshorylation is 
detectable by examining affinity of the binding partners to the fusion protein or by examining changes 
in intracellular location of the rGFP or pGFP fusion polypeptide (see for example, Durocher, D. et ai. 
(2000) mi Cell 6:1169-82; Yaffe. M.B. et al. (2001) Structure 9: R33-8). 

In another preferred embodiment, the phosphorylation substrates are candidate substrates comprising 
library of random peptides, a library of cDNA fragments, or a library of genomic nucleic acid fragments 
fused to rGFP or pGFP, as discussed below, in one aspect, the library of candidate substrates is 
expressed In a host cell, each of which expresses a different candidate substrate. A kinase is 
contacted with the fusion proteins, for example by tnansfecting the cells with' a vector expressing the 
kinase or by treating the cells to a condition that Induces kinase activity. Peptides affecting the GFP 
fluorescence properties or localization of rGFP or pGFP fusion protein substrate following treatment 
with kinase Is Identified. Sequences producing detectable changes are isolated and sequenced to 
determine the putative kinase siequences. 
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The general approach outlined above are applicable to a variety of other protein modrficatton 
reactions. For example, adenosine diphosphate (ADP)-nbosyltransferases binds nicotinamide 
adenine dinucleotlde (NAD), and catalyzes the transfer of the ADP-fibose moiety to an acceptor 
nudeophlie, with cleavage of the giycosidic bond between N-1 of the nicotinamide and C-1 of the 
adjacent ribose. The modification may comprise a mono-ADP ribosylation or poly-AOP ribosylation, 
depending on the transferase enzyme (Koch-Ndte, F. (2001) J. Blotechnol. 92: 81-87). Bacterial 
toxins, such as pertussis toxin and cholera toxin, act by on ADP ribosylating heterotrimeric GTP 
binding proteins that control intracellular signaling and vesicle trafficldng. Poiy-ADP ribosylation 
appears to play roles in DNA damage recovery, DNA replication and viral integration. Thus, the 
present Jnvention provides for fusion proteins comprising rGFP or pGFP and ADP ribosylation sites 
made In the same manner as that proylded for phosphorylation sites. Mono and poly ADP ribosylated, 
sequences include, among others, those present on heterotrimeric G proteins (Yamannoto, M. (1993) 
Oncogene 8: 1449-55; Finck-Bartjancon, V. (1995) Biochemistry 34: 1070-75; and von Olleschll<- 
Elbheim, L. (1997) Adv Exp Med Biol 419: 87-91), muscle protein desmin (Zhou. H., et al. (1996) Arch. 
Blochem. Biophys. 334: 214-222), poiy-ADP ribosyiase (Martinez, M. (1991) Blochem Blophys Res 
Commun. 181: 1412-8), and phosphorylase kinase (Okazaki, I.J. (1996) Adv. Pharmacol. 35: 247-^0). 

In another prefen'ed embodiment, the fusion proteins comprise rGFP or pGFP fused to protease 
recognition sequences for detecting protease activity. Biological functk>ns of proteases are well known 
in the art, including, but not limited to pathogenesis (e.g., pdyprotein processing by HiV protease), cell 
death (e.g., caspases), cell adhesion (e.g., metalloproteases), and the like. In one aspect, protease 
recognitions sequences, as further described below, are fused to rGFP or pGFP or variants thereof. 
Cleavage of the fusion protein changes the fluorescence characteristic, which provides a measure of 
protease activity, in one aspect, the cleavage site is inserted into the rGFP or pGFP. That is, the 
protease recognition sequence is Inserted Into the internal regions of GFP, preferably the surface 
loops. 

In another preferred embodiment, the protease substrates may comprise fusion of rGFP or pGFP to 
rGFP or pGFP variants or other fluorescent proteins such that fluorescence resonance energy transfer 
(FRET) Is possible between the two linked fluorescent molecules. Generally, fluorescence resonance 
energy transfer occurs between two dye molecules in which excitation Is transferred from a donor 
molecule to an acceptor molecule without emission of a photon. Donor and acceptor molecules must 
be in close proximity (i.e., radial distance within approximately 10 nm of each other) and have their 
transition dipole orientations approximately parallel to each other. For excitation transfsr from donor to 
acceptor to occur, tiie absorption spectrum of ttie acceptor must overiap the fluorescence emission 
spectrum of tiie donor. Suitable pairs of fluorescent molecules capable of undergoing FRET signal 
may include rGFP or pGFP with BFP (blue fluorescent protein. Heim, R. et al. (1996) Curr. Biol. 6: 
178-82). rGFP or pGFP with BFP5 (Mitra, R.D. (1996) Gene 173: 13-17). rGFP or pGFP with cyan 
fluorescent protein (CFP), rGFP or pGFP with Anenfonla majano fluorescent protein amFP486 (Matz, 
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M.V. (1999) Nat Biotechnology 17: 969-73). rGFP or pGFP with Dlscosoma striata dsFP 483 (Matz, 
supra), rGFP or pGFP with Clavularia cFP484 (Matz, supra), and the like. In these donor acceptor 
pairs, the rGFP or pGFP functions as the acceptor. In principle, other donor acceptor pairs are 
possible in which rGFP or pGFP serves as the donor to acceptor fluorescent protein variants having 
excitation and emission peaks of about 20 nm or more than the those of rGFP or pGFP, Examples of 
suitable acceptors Include yellow fluorescent protein 0-e., class 4 GFPs, see Tsien, R. (1998) Ann. 
Rev. Bk)chem. 67: 509-44), Zoanthus zFP538 (Matz, supra), Dlscosoma drFP583 (Matz. supra), and 
the like. The protease recognition site is incorporated as part of the linker sequence connecting the 
donor and acceptor GFPs. Cleavage of the linker by proteases results in physical separation of the 
^o fluorescent proteins, thus resulting In loss of FRET. A variety of protease and protease 
recognition sequence combinations may be used, as further described below. The reactions may 
occur in vitro by contacting the protease with a FRET protease substrate. In another aspect, the 
reactions are done In vivo by expressing the protease substrates In the cell and introducing vectors 
expressing the protease or inducing the protease activity by appropriate treatment of the cells. In one 
preferred embodiment, the FRET protease substrate and/or protease are introduced into the cell by 
retroviral vectors. 

Since FRET based reactions provide a t)asls for monitoring various biological processes, FRET using 
rGFP or pGFP or their variants as either the donor or acceptor is also applicable for examining various 
biological reactions. In one preferred emtKxliment, the FRET molecule comprising rGFP or pGFR, 
acting as either a donor or acceptor molecule, further comprises a sequence capable of binding an 
analyte or ligand which causes a change in the spatial orientation of the donor fluorescent protein and 
the acceptor fluorescent protein relative to one another (see US Pat. No. 6,197,928, hereby expressly 
incorporated by reference). In one preferred embodiment, the ligand binding region is fused to the two 
fluorescent proteins without linkers. In another preferred embodiment, the ligand binding region Is 
fused to the fluorescent proteins by linkers to provide proper spatial orientation between the donor and 
acceptor fluorescent proteins for FRET to occur and to permit binding of ligand to the binding 
sequence. 

Various binding regions may be used with the present Invention. These include calcium binding 
regions (MIyakawa, A. et al. (1997) Proc. Natl. Acad. Sci. USA 93: 13617-22; Rosomer, V.A. (1997) J. 
Biol. Chem. 272: 13270-74), protein interactton domains (e.g., phosphoprotein binding domain), 
receptors (e.g., Fas), and the like (see US Pat. No. 6.197.928). Linkers may comprise glycines or 
serines or combinations thereof to prevent structural perturit>atlons between the GFPs (e.g., to cause 
proper folding of the proteins) and the binding domains. Linker sequences are appropriately 
positioned to either cause an Increase or a decrease in FRET upon binding of ligand to the binding 
sequence. In one aspect, various mutant forms of the binding domain may be made to maximize the 
range of ligand concentrations capable of being detected in vivo or in vitro by FRET. Fusing these 
fusion proteins to targeting sequences allows measuring the concentratbn of the analytes within 
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particular subcellular compartments. In a preferred embodlmentp the GFPs used for FRET and their 
coHBsponding binding regbns and linker sequences are codon optimized to maximize expressbn 
within particular cells, especially mammaiian cells. Codon optimization is employed because non- 
optimized forms may not produce sufficient changes in FRET signal to act as a FRET reporter 
molecule. 

In another preferred embodiment, the FRET based reactions do not use a sequence that physically 
links the donor and acceptor pairs. That is, the donor and acceptor fluorescent fusion proteins exists 
separately. In this prefened embodiment, rGFP or pGFP fusions may be made to protein-interaction 
domains. Thus, a first fusion protein comprises a first protein Interaction domain fused to rGFP or 
pGFP, or their variants. A second fusion protein comprises a second protein Interaction domain, 
which is capable of interacting with the first protein interaction domain, fused to a fluorescent protein 
capable of undergoing FRET with rGFP or pGFP. Juxtaposition of the two fluorescent proteins 
through the protein interaction regions results in a FRET signal, in general, fused fluorescent proteins 
separated by a linker provide a positive control for a detectable FRET signal. Conversely, expression 
of each fluorescent protein fused to its cognate protein Interaction domain provides a negative control 
for detemiining background signal and the relative signal intensities of the two fluorescent proteins. 
Cells expressing the fusion proteins may be examined in vivo, in vitro, or after fixation In a chemical 
fixative (e.g., formaldehyde, parafomnaldehyde, glutaraldehyde). Generally, measuring the FRET ratio 
provMes one basis for detenmlnlng interaction between the two protein Interaction domains (Miyakawa, 
A, et al. (2000) Methods Enzymol. 327: 472-500). As further described below, the protein Interaction 
domain comprises any sequence capable of interacting with other molecules, including other proteins, 
nucleic acids, lipids, carit)ohydrates, and the like. The interaction domains may be identical, in which 
case homomulflmeric interactions may be examined, while in other cases, the interaction domains are 
different, in which case heteromulttmeric interactions may be examined (Guo, C. et al. (199 5) J. Biol 
Chem 270: 27562-68; iVlahajan, N.P. (1998) Nat. Biotechnol. 16: 547-52; Ng. E.K. (2002) J. Cell 
Blochem. 84: 556-66; and Day, R.N. (2001) Methods 25: 4-18). 

Since fluorescent proteins serve as useful reporters of cellular events, the present invention further 
relates to fusion proteins comprising rGFP or pGFP fused to various protein interaction domains 
whose Interactions change depending on the physiological state of the cell. These fusion proteins 
serves as biosensors, as defined below. Protein interaction domains whose interactions with binding 
partners change with different cellular states are well known In the art As illustrated below, pleckstrin 
domains bind specifically to PtdlnsPj, which is released from the membrane by action of 
phospholipases activated by signal transduction events. Phosphoprotein binding domains (e.g., SH2 
domains) interact with specific phosphorylated peptide sequences as part of their mechanism of signal 
transduction. The voltage sensing domain of voltage sensitive ion channels (e.g. Shaker potassium 
channels) shifts within the membrane depending on the membrane potential, thus altering the solution 
environment of sequences adjacent to the voltage sensor (Slegel, M.S. (1997) Neuron 19: 735-41). In 
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the present invention, rGFP, pGFP or variants thereof are fused to these sequences to generate 
fusion proteins whose cellular localization or fluorescence properties change depending on the 

physiological state of the cell. Determining changes in cellular localization may be done by 
fluorescence microscopy while changes in fluorescence may be examined by measuring the 
fluorescence characteristics at two different cellular states. 

In another preferred embodiment, the fusion polypeptides comprise rGFP or pGFP fused to peptides 
or proteins encoded by cDNA or cDNA fragments. As used herein, cDNA Is meant a DNA that Is 
complementary to at least a portion of an RNA, preferably a messenger RNA, and is generally 
synthesized from an RNA preparation Using reverse transcriptase. As further described below, the 
cDNA may be full length (i.e.. complementary to the full length RNA) or a partial cDfMA, which Is less 
than the full length RNA. The cDNA may be a cDNA fragment, which Is derived from a larger cDNIA 
by methods described below. Methods for constructing cDN A libraries from RNA, especially mRNA, 
are well known In the art (see Ausubel, F. In Cument Protocols In Molecular Biology, John Wiley & 
Sons, updated October 2001 , Chapter 5, Construction of Recombinant DNA Libraries, particulariy 
Section 111, Preparation of Insert DNA from Messenger RNA, expressly Incorporated by reference 
herein). In addition, two commonly used methods of producing cDNA are described in Okayama and 
Berg, Mol. (1982) Cell Biol. 2: 161-170 and Guber and Hoffman (1983) Gene 25: 263-269. In a 
prefenred embodiment, the cDNAs are Inserted into the carboxy or the amino tenminal region of rGFP 
or pGFP. In another preferred embodiment. cDNA Is inserted onto the Intemal regions of rGPF or 
pGFP. Preferably, the insertions do not affect the fluorescence of the rGFP or pGFP to allow 
monitoring of cDNA expression. Fusions to the amino tenninal or intemal regions of rGFP or pGFP 
perniit identification of cDNAs that are In frame with respect to the GFP protein as indicated by the 
expression of fluorescent fusion proteins. Preferably, codon optimized rGFP or pGFP variants are 
used to maximize expression of the fusion polypeptides and to increase the fluorescence signal of 
expressed fusion nucleic acids. 

As provided more fully below, cDNA may be generated from any number of organisms and cells types, 
Including cDNAs generated from eukaryotic and prokaryotic cells, viruses, cells Infected with vlmses, 
pathogens or from genetically altered cells. The cDNA may encode speciflc domains, such as 
signaling domains, protein-interaction domains, membrane binding domains, targeting domains, and 
the like. Furthermore, the cDNA may be frameshifted by adding or deleting nucleotides, which may 
result In an out of frame construct, such that a pseudorandom peptide or protein is encoded. In 
addition, the cDNAs and cDNA libraries contemplate various subtracted cDNA or enriched cDNA 
libraries (e.g., secreted or membrane proteins; see Kopczynski, C.C. (1998) Proc. Nati. Acad. Sd. 
USA 95: 9973-78). That is. a cDNA library may be a complete cDNA library from a cell, a partial 
library, an enriched library from one or more cell types, or a constructed library with certain cDNAs 
being removed to fomn a library. 
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In another prefen-ed embodiment, the fusfon polypeptides comprise rGFP or pGFP fused to proteins 
or peptides encoded by genomic DNA. As elaborated above for cDNA. the genomic DNA can be 
derived from any number of organisms or cells, Including genomic DNA of eukaryotic or phDkaryotic 
cells, or vinjses. They may be from nomiai cells or cells defective in cellular processes, such as 
tumor suppression, cell cycle control, or cell surface adhesion. As more fully explained below, the 
genomic DNA may be from entire genomic constructs or fractionated constructs, including randonri or 
targeted firactionatlon. 

in another preferred embodiment, the fusion polypeptides comprise rGFP or pGFP fused to random 
peptides. Generaily, peptides ranging from about 4 amino acids in length to about 100 amino acids 
may l^e used, with peptides ranging from about 5 to about 50 being preferred, with from about 8 to 
about 30 being particulariy preferred and from about 10 to about 25 being especially preferred. As 
more fully explained below, the peptides are fully randomized or they are biased In their 
randomization, in one prefenred embodiment, the random peptide is linl<ed to a fusion partner to 
stmcturaily constrain the peptide and allow proper interaction with other molecules while in another 
preferred embodiment, the expressed random peptide is not iini<ed to a fusion partner. Random 
peptides expressed as fusions with rGFP, pGFP, or variants thereof may be screened for its ability to 
produce an altered cellular phenotype. 

For the .fusion polypeptides of the present invention, the fusions are made in a variety of ways. Iri one 
preferred embodiment, the peptide is fused to the N-termlnus of the rGFP or pGFP. The fusion can 
be direct, i.e., with no additional residues between the C-tennlnus of the peptide and the N-temninus of 
the rGFP or pGFP, or indirect; that is, intervening amino acids are used, such as one or more fusion 
partners, including a iinicer. in this embodiment, when the fusion are to peptides, such as random 
peptides or protein interaction domains, preferably a presentation structure Is used to confer some 
conformational stability to the peptide. Particularly preferred embodiments include tiie use of 
dimerization sequences. 

In one embodiment, N-temninal residues of the rGFP or pGFP are deleted, i.e., one or more amino 
acids of the rGFP or pGFP can be deleted and replaced with the protein or peptide of Interest 
However, as noted above, deletions of more than 7 amino acids may render the rGFP or pGFP less 
fluorescent, and thus larger deletions are generally not prefened. In a preferred embodiment, the 
fusion is made directly to the first amino acid of the rGFP or pGFP. 

In a preferred embodiment, the peptide Is fused to the C-tenninus of the rGFP or pGFP. As above for 
N-tennlnal fusions, the fusion can be director indirect, and C-terminal residues may be deleted. 

in a prefenred embodiment, proteins, peptides and fusion partners are added to tx}th the N- and the C- 
tenninai regfons of the rGFP or pGFP. As the N- and C-temiinal region of rGFP and pGFP are 



24 



wo 02/090535 



PCT/US02/14766 



putati vely on the same face" of the protein as is the case for aGFP, in spatial proximity (within 1 8 A), it 
Is possible to make a non-covalently "circular" rGFP or pGFP using the components of the Invention. 
Thus, for example, the use of dimerization sequences can allow a noncovalently cycllzed protein; by 
attaching a first dimerization sequence to either the N- or C-temiinus of rGFP or pGFP, and adding a 
peptide of Interest and a second dimerization sequence to the other temnlnus, a large compact 
staicture can be fomned, with the protein or peptide displayed in a structure constrained by the 
dimerization sequences. 

In a preferred embodiment, the protein or peptide of interest is fused to an internal position of the 
rGFP orpGFP; that Is, the peptide Is Inserted at an Intemal position of the rGFP or pGFP. While the 
peptide can be Inserted at virtually any position, preferred positions include Insertion at the very tips of 
"loops" on the surface of ttie rGFP or pGFP. to minimize disruption of the rGFP and pGFP p-can 
protein stmcture. Thus, tfie rGFP or pGFP fusion polypeptide retains its ability to fluoresce, or 
maintain a T„ of at least 42*^0 under assay conditions. 

In a prefen^d eml^odiment, the proteins, peptides or other fusion partner is inserted In rGFP and/or 
pGFP loops. That Is, as outlined below, peptides or libraries of peptides can be inserted into (e.g., 
witiiout replacing any residues) or replace external loops by tiie addition of the peptides or otiier fusion 
partners to replace one or more of tiie native residues. In a preferred embodiment, tiie loop 
comprises residues from about 51 to about 62 for nmGFP or pGFP, and residues from about 48 to 
about 58 for nrGFP. Similar preferred embodiments utilize replacements or insertions at positions 
from about 79 to about 84 of botii nmGFP and pGFP (about 76 to about 81 for rrGFP); replacements 
or insertions at positions from about 101 to about 107 (about 99 to about 104 for rrGFP); 
replacements or Insertions at positions from about 1 1 7 to about 1 20 (about 1 1 4 to about 1 1 7 for 
nrGFP); replacements or Insertions at positions fiom about 130 to about 148 (about 127 to about 145 
for rrGFP); replacements or insertions at positions from about 154 to about 160 (about 151 to about 
157 for nrGFP); replacements or Insertions at positions from about 170 to about 170-177 (about 167 
to about 174 for nrGFP); replacements or insertions at positions from about 186 to about 197 (about 
183 to about 194 for n<BFP); and replacements or insertions at positions firom about 206 to about 213 
(about 202 to about 21 1 for nrGFP). More prefBrably, tiie insertion or replacement will take place 
between residues 117-120 for mnGFP or pGFP (114-1 17 for rrGFP); 170-177 (167-174 for n<3FP); or 
206-213 (202-21 1 for rrGFP). Most preferably tiie insertion will take place between residues 170-177 
or 208-213 of rmGFP or pGFP and conesponding residues of n<3FP. 

In a preferred embodiment, the peptide of Interest is Inserted, witiiout any deletion of rGFP or pGFP 
residues. That Is, tiie Insertion point is between two amino acids in the loop, adding tfie new amino 
adds of tiie peptide and fusion partners, including linkers. Generally, when linkers are used, the 
linkers are direcOy fused to the rGFP or pGFP, with additional fusion partners, if present, being fused 
to the linkers and ttie peptides. 
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In a preferred embodiment the peptide Is Inserted into the rGFP or pGFP, with one or more rGFP or 
pGFP residues being deleted; that Is, the peptide (and fusion partners, including linkers) replaces one 
or more residues. In general, when linkers are used, the linkers are attached directly to the rGFP or 
pGFP. Thus, it is linker residues which replace the GFP residues, again generally at the tip of the 
loop. In general, when residues are replaced, from one to five residues of GFP are dieted, witii 
deletions of one, two, three, four and five amino acids all possible, in another preferred embodirnent, 
fusion polypeptides of ttie invention do not include linkers. When linkers are not used, the fusion 
polypeptides will be significantiy more consti^ined because of the reduction in conformational freedom 
Imposed by ttie GFP structure. 

In a preferred embodiment, peptides (including fusion partners, If applicable) can be inserted into 
more than one loop of the scaffold, the amino terminal region, ttie carboxy terminal regions, or 
combinations tiiereof. Thus, for example, adding peptides to two loops can increase ttie complexity of 
a random peptide library but still allow presentation of these loops on tiie same face of tiie protein. 
Similarly, it is possible to add peptides to one or more loops, and add other fusion partners to other 
loops, or amino terminal or carboxy temninal regions, for example targeting sequences, etc., to provide 
additional biological properties to ttie fusion polypeptide or to localize tiie peptide to subcellular or 
extracellular compartments where molecular interactions can take place. 

Accordingly, in a preferred embodiment, the fusion polypeptides may further comprise fusion partners. 
By "fusion partner" herein Is meant a sequence that is associated with ttie peptide that confers upon 
ail members of tiie library in that class a common function or ability. Fusion partners can be 
heterologous (i.e., not native to the host cell), or synthetic (i.e., not native to any ceil). Suitable fusion 
partners Include, but are not limited to: a) presentation structures, as defined below, which provide tiie 
peptides in a oonfonnationally restricted or stable form; b) targeting sequences, defined below, which 
allow ttie localization of tiie peptide Into a sulDcellular or extracellular compartment; c) rescue 
sequences as defined below, which allow the purification or isolation of eltiier the peptides or the 
nucleic acids encoding tiiem; d) stability sequences, which affects stability or protection from 
degradation to ttie peptide or ttie nucleic acid encoding it, for example resistance to proteolytic 
degradation; e) linker sequences, which conformationaily decouple the random peptide elements from 
the scaffold ftseif, which keep ttie peptide from interfering witti scaffold folding; f) any protein of 
Interest; or g) any combination of the at>ove. as well as linker sequences as needed. Since particular 
fusion partners are active in certain organisms or cells while not active in ottiers, tiiose skilled in the 
art can choose ttie appropriate fusion partner for particular cells or organisms. 

in a preferred embodiment, the fusion partner is itself a presentation structure. By "presentetion 
structure" or grammatical equivalents herein is meant a sequence, which, when fused to peptides, 
causes the peptides to assume a conformationaily restricted form. Proteins interact with each otiier 
largely through confonmatlonally constrained domains. Although small peptides with freely roteting 
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amino and caitoxyl temninl can have potent functions as is known In the art, the conversion of such 
peptide structures Into pharmacobgic agents is difficult due to the Inability to predict side-chain 
positions for peptidomimetic synthesis. Therefore the presentation of peptides In confomnatlonally 
constrained structures will benefit both the later generation of phamnaceutlcals and will also likely lead 
to higher affinity Interactions of the peptide with a target protein. This fact has been recognized in the 
combinatorial library generation systems using blologlcally generated short peptides in bacterial phage 
systems. A number of workers have constructed small domain molecules in which one might present 
peptide structures (e.g., randomized peptide sequences). 

Thus, synthetic presentation stnjctures, i.e. artificial polypeptides, are capable of presenting a peptide 
as a confomnationallyHnestricted domain. Generally such presentation structures comprise a first 
portion joined to the N-termlnal end of the peptide of Interest, and a second portion joined to the C- 
termlnal end of the peptide; that Is, the peptide is inserted into the presentation stnjcture, although 
variations may be made, as outlined below, in which elements of the presentation structure are 
Included within the peptide sequence. To limit the background cellular effects of protein sequences 
that are not part of the expressed protein or peptide of interest, the presentation stiiictures are 
selected or designed to have mininDal biologically activity when expressed In the target cell. 

PrefenBd presentation structures enhance Interaction with binding partners by confonmationally ■ 
constraining the displayed peptide and maximizing accessibility to the peptide by presenting it on an 
exterior surface such as a loop. Accordingly, suitable presentation structures include, but are not 
limited to, dimerizatlon sequences, minibody structures, loops on (J-tums and colied-coil stem 
stmctures In which residues not critical to strxjcture are randomized, zinc-finger domains, cysteine- 
llnked (disulfide) structures, transglutaminase linked structures, cyclic peptides, B-loop structures, 
helical banfels or 4-helix bundles, leucine zipper motifs, etc. 

In a preferred embodiment, the presentation structure is a colled-coil structure, allowing the 
presentation of a peptide, especially a random peptide, on an exterior loop (see Myszka et al. (1994) 
Blochem. 33: 2362-2373, hereby incorporated by reference). Using this system investigators, have 
Isolated peptides capable of high affinity interaction with the appropriate target. In general, colled-coil 
stmctures allow for between 6 to 20 randomized positions. 

A prefenred colled-coil presentation structure Is as follows: 

MG CAALESEVSALESEVASL^SEVAALG RGDM PLAAVKSjg-SAVKSKLASVKSKI^ The 
underlined regions represent a colled-coil leucine zipper region defined previously (see Martin, et al. 
(1994) EMBO J. 13:5303-09, Incorporated by reference). The bolded GRGDI\/IP region represents the 
loop structure and when appropriately replaced with peptides (I.e. peptides, generally depicted herein 
as (X)„, where X is an amino acid residue and n is an integer of at least 5 or 6) can be of variable 
length. The replacement of the bolded region is facilitated by encoding restriction endonuclease sites 
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in the underlined regions, which allows the direct incorporation of oligonucleotides encoding peptides 
of Interest at these positions. For example, a preferred embodiment generates a Xhol site at the - 
double underlined LE site and a Hindlli site at the double-underlined KL site. 

In a prefenred embodiment, the presentation structure Is a minibody structure. A "minlbod/* is 
essentially composed of a minimal antibody complementarity tBgion. The minibody presentation 
structure generally provides two peptide regions that are presented aiong a single face of the tertiary 
structure in the folded protein (see Bianchi et al. (1994) J. IVtoi. Biol. 236: 649-59, and references cited 
therein, ail of which are Incorporated by reference). Investigators have shown this minimal domain is 
stable in solution and have used phage selection systems in combinatorial libraries to select 
minlbodles with displayed peptide sequences exhibiting high affinity, = 1 0*^ for the pro-Inflammatory 
cytoicine IL-6. 

A preferred minibody presentation structure is as follows: 

i\1GRNSQATS GFT/=SHR ^iV1EWVRGGEYiAAS RHKHNKYT TEYSASVKGRYIVSRDTS^^ 
PP. The bold, underlined regions are the regions which may be replaced with a peptide or 
randomized. The italicized phenylalanine must be Invariant in the first peptide display region. The 
entire peptide Is cloned In a three-ollgonucleotide variation of the coiled-coll embodiment, thus 
allowing two different peptides of interest to be incorporated simultaneously. This embodiment utilizes 
non-palindromic BstXi sites on the temninl. 

In a preferred embodiment, the presentation structure is a sequence that contains generally two 
cysteine residues, such that a disulfide bond may be formed, resulting in a conformationally 
constrained sequence. This embodiment is particularly prefemed ex vivo, for example when secretory 
targeting sequences are used. As will be appreciated by those In the art, any number of peptide 
sequences, witii or without spacer or llnldng sequences, may be flanked with cysteine residues. In 
other embodiments, effective presentation structures may be generated by the peptides of Interest 
themselves. For example, the random peptides may be "doped" with cysteine residues which, under 
the appropriate redox conditions, may result in highly crosslinlced structured confomiations, similar to 
a presentation structure. Similarly, the randomization regions may be controlled to contain a certain 
number of residues to confer li-sheet or a-hellcai structures. 

in a preferred embodiment, ttie presentation sequence confers the ability to bind metal ions to confer 
secondary stmcture. Thus, for example, C2iH2 zinc finger sequences are used; C2H2 sequences 
have two cysteines and two histidlnes placed such tiiat a zinc ion is chelated. Zinc finger domains are 
known to occur independentiy in multiple zinc-finger peptides to form structurally independent, flexibly 
linked domains (see Nakaseko, Y. et al. (1992) J. Mol. Biol. 228: 619-36). A general consensus . 
sequence is (5 amino acids)-C-(2 to 3 amino acids)-C-(4 to 12 amino aclds>-H-(3 amino acids)-H-(5 
amino acids). A prefen^d example would be -FQCEEC- peptide of 3 to 20 amino acids-HIRSHTG-. 
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Similarly, CCHC boxes can be used that have a consensus sequence -^-(2 amino aclcls)-C-(4 to 20 
amino add peptide)-H-(4 amino acids)-C- (see Bavoso, A. et al. (1998) Blocliem. Bfopliys. Res. 
Commun. 242: 385-89, liereby incorporated by reference). Preferred examples include (1) -VKCFNC- 
4 to 20 amino aclds-HTARNCR-, based on the nucleocapsid protein P2; (2) a sequence modified from 
that of the naturally occuning zino-binding peptide of the l-asp-1 LIM domain (Hammarstrom, A. et al. 
(1996) Biochemistry 35: 12723-32); and (3) -MNPNCARCG^ to 20 amino acid peptlde-HKACF-, 
based on the NMR structural ensemble 1ZFP (Hammarstrom, A et al., suora) . 

In a preferred embodiment, the presentation staicture includes two dimerization sequences, including 
self-binding peptides. A dimerization sequence allows the non-covalent association of two peptide 
sequences, which can be the same or different, with sufficient affinity to remain associated under 
normal physiological conditions. These sequences may be used In several ways, in a prefen^ed 
embodiment, one temninus of the protein or peptide is Joined to a first dimerization sequence and the 
other tenrtlnus is joined to a second dimerization sequence, which can be the same or different finom 
the first sequence. This allows the fonmatlon of a loop upon association of the dimerizing sequences. 
Altematively, the use of these sequences effectively allows small libraries of peptides (for example, 
10*) to become large libraries if two peptides per cell are generated which then dimerize, to form an 
effective library of 1 0® (1 0* X 1 0*). It also allows the fbmnation of longer protein or peptide libraries, if 
needed, or more structurally complex peptide molecules. The dimers may be homo- or heterodlmers. 

Dimerization sequences may be a single sequence that self-aggregates, or two different sequences 
that associate. That is, nucleic acids encoding both a first peptide with dimerization sequence 1, and a 
second peptide witii dimerization sequence 2, such tiiat upon Introduction into a cell and expressibn of 
the nucleic acid, dimerization sequence 1 associates with dimerization sequence 2 to fbnrt a new 
peptide structure. The use of dimerization sequences allows the noncovalent "constraint" of the 
displayed peptides; that is, If a dimerization sequence is used at each terminus of ttie peptide, the 
resulting structure can fonn a constrained stmcture. Furtfiennore, tiie use of dimerizing sequences 
fused to botti tiie N- and C-temninus of the scaffold such as rGFP or pGFP forms a noncovalentiy 
constrained scaffold peptide library. 

Suitable dimerization sequences will encompass a wide variety of sequences. Any number of protein- 
protein interaction sites are known. In addition, dimerization sequences may also be elucidated using 
standard methods such as the yeast two hybrid system, traditional biochemical affinity binding studies, 
or even using the present methods (see for example, WO 99/51625, hereby Incorporated by reference 
In its entirety). Particulariy preferred dimerization peptide sequences Include, but are not limited to, - 
EFLIVKS-, EEFLIVKKS-, -FESIKLV-, and -VSIKFEL- . More prefened dimerization peptide 
sequences Include EEEFLIVEEE when used togetfier witti KKKFLIVKKK. 

In a prefen-ed embodiment, the fusion partner Is a targeting sequence. As will be appreciated by 
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those In the art, the localization of proteins within a cell Is a simple method for Increasing effective 
concentration within a defined compartment. For example, RAF1 when localized to the mitochondrial 
membrane can Inhibit the antl-apoptotic effect of BCL-2. Similarly, membrane bound Sos Induces Ras 
mediated signaling in T-(ymphocytes. These mechanisms are thought to rely on the principle of 
increasing the protein concentration in a limited volume within a cell, that is to say, the localization of a 
protein to the plasma membrane limits the search for Its ligand to that limited dlmensbnal space near 
the membrane as opposed to the three dimensional space of the cytoplasm. Altematrvely, the 
concentration of a protein can also be simply Increased by nature of the localization. Shuttling the 
proteins into the nucleus confines them to a smaller space thereby Increasing concentration. Finally, 
the ligand or target may simply be present in a specific compartment such that effectors (e.g.. 
Inhibitors) must be localized appropriately. 

Thus, suitable targeting sequences Include, but are not limited to, binding sequences capable of 
causing binding of the expression product to a predetermined molecule or class of molecules white 
retaining btoactivrty of the expression product (for example by using enzyme Inhibitor or substrate 
sequences to target a class of relevant enzymes): sequences signaling selective degradation, of itself 
or co-bound proteins; and signal sequences capable of constltutlvely localizing the peptides to a 
predetennined cellular locale, Including a) subcellular locations such as the Golgi, endoplasmic 
reticulum, nucleus, nucleoli, nuclear membrane, mitochondria, chloroplast, secretory vesicles, 
lysosome, periplasmic space, cellular membrane; and b) extracellular locattons via a secretory signal. 
Particutariy preferred Is localization to either subcellular locations or to the outside of the cell via 
secretion. 

In a prefenred embodiment, the targeting sequence is a nuclear localization signal (NLS). NLSs are. 
generally short, positively charged (basic) domains that serve to direct the entire protein in which they 
occur to the cell's nucleus. Numerous NLS amino acid sequences have been reported including 
single basic NLS's such as that of the SV40 (monl<ey virus) large T Antigen (PKKKRKV, Kalderon, D. 
et al. (1984) Ceil 39: 499-509); the human retinoic add receptor-IS nuclear localization signal 
(ARRRRP), NFKB p50 (EEVQRKRQKL, Ghosh, 8. et al. (1990) Cell 62: 1019-29); NFkB p65 
(EEKRKRTYE, Nolan. G. et al. (1991) Cell 64: 961-99; and others (see for example Boullkas. T. 
(1994) J. Ceil. Biochem. 55: 32-58, hereby incorporated by reference) and double basic NLS's 
exemplified by that of the Xenopus (African clawed toad) protein, nucleoplasmin 
(AVKRPAATKKAGQAKKKKLD, Dingwall, C. et al. (1982) Ceil, 30: 449-58, and Dingwall, S. et al. 
(1988) J. Cell Biol. 1 07: 641-49). Numerous localization studies have demonstrated that NLSs . 
Incorporated In synthetic peptides or grafted onto proteins not nonnally targeted to the ceil nucleus 
cause these peptides and proteins to concentrate in the nucleus (see Dingwall S. et al. (1986) Ann. 
Rev. Cell Biol. 2: 367-90; Bonnerot, C. et al. (1987) Proc. Natl. Acad. Sci. USA 84: 6795-99; Galileo, 
D.S. et al. (1990) Proc. Natl. Acad. Sd. USA 87: 458-62.) 
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Membrane-anchoring sequences are well- known in the art and are based on the genetic geometry of 
mammalian transmembrane molecules. Peptides are inserted Into the membrane via a signal 
sequence (designated herein as ssTM) and stably held In the membrane through a hydrophobic 
transmembrane domain (TM). The transmembrane proteins are positioned in the membrane such 
that the protein region encompassing the amino temnlnus relative to the transmembrar^ domain are 
extracellular and the region towards the cari30xy temnlnal are intracellular. Of course, If the position of 
transmembrane domains is towards the amino end of the protein relative to the peptide of Interest, the 
TM will serve to position the peptide intracellularly, which may be desirable in some embodiments. 
ssTMs and TMs are l^nown for a wide variety of membrane bound proteins, and these sequences are 
used accordingly, either as pairs from a particular protein or with each component being taken from a 
different protein. Altematlvely, the ssTM and TM sequences are synthetic and derived entirely from 
consensus sequences, thus serving as artifidai delivery domains. 

Membrane-anchoring sequences are well Icnown in the art and are based on the mammalian 
transmembrane molecules. Peptides are inserted Into the membrane based on a signal sequence 
(designated herein as ssTM) and require a hydrophobic transmembrane domain (herein TM). The 
transmembrane proteins are Inserted into the membrane such that the region N-tenninal to the TM 
domain are extracellular and the sequences C-temilnal to the TM become Intracellular. Of course, if 
these transmembrane domains are placed 5* of the variable region, they will serve to anchor It as an 
Intracellular domain, which may be desirable In some embodiments. ssTMs and TMs are known for a 
wide variety of membrane bound proteins, and these sequences may be used accordingly, either as 
pairs from a particular protein or with each component being taken from a different protein. 
Altematlvely. the sequences may be synthetic and derived entirely from consensus sequences for use 
as artificial delivery domains. 

As will be appreciated by those in the art, membrane-anchoring sequences, including both ssTM and 
TM, are known for a wide variety of proteins and any of these are useful In the present invention. 
Parttoulariy prefen^d membrane-anchoring sequences include, but are not limited to, those derived 
from CDS, ICAM-2, IL-8R. CD4 and LFA-1. Other useful ssTM and TM domains Include sequences 
from: (a) class 1 Integral membrane proteins such as IL-2 receptor beta-chain (residues 1-26 are the 
signal sequence, 241-265 are the transmembrane residues; see Hatakeyama. M. et al. (1989) 
Science 244: 551-56 and von Heljne, G. et al. (1988) Eur. J. Biochem. 174: 671-78) and insulin 
receptor p chain (residues 1-27 are the signal domain, 957-959 are the transmembrane domain and 
960-1382 are the cytoplasmic domain; see Hatakeyama. supra, and Eblna, Y. et ai. (1985) Cell 40: 
747-58); (b) class II integral membrane proteins such as neutral endopeptldase (residues 29-51 are 
the transmembrane domain, 2-28 are the cytoplasmic domain; see Malfroy. B. et al. (1987) Biochem. 
Biophys. Res. Commun. 144: 59-66); (c) type 111 proteins such as human cytochrome P450 NF25 
(Hatakeyama, supra); and (d) type IV proteins such as human P-glycoprotein (Hatakeyama, supra). 
Particularty prefened are CDS and ICAM-2. For example, the signal sequences from CDS and ICAM- 
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2 He at the extreme 5' end of the transcript These consist of the amino acids 1 -32 in the case of CDS 
(MASPLTRFLSLNLLLLGESiLGSGEAKPQAP, NakauchI, H. et al. (1985) Proc. Nati. Acad. Sci. USA 
82: 5126-30) and amino acids 1-21 in the case of ICAIVI-2 (IVISSFGYRTLTVALin"LICCPG, Staunton, 
D.E. et ai. (1989) Nature 339: 61-64). These leader sequences deliver the construct to the membrane 
while the hydrophobic transmembrane domains placed at the carix)xy tenmtnal region relative to the 
peptide of interest or peptide candidate agents serve to anchor the construct in the membrane. These 
transmembrane domains are encompassed by amino acids 145-195 from CD8 
(PQRPEDCRPRGSVKGTGLDFACDIYIWAPLAGICVALLLSLIITLiCYHSR, NakauchI, supra) and 224- 
256 from ICAIVI-2 (IVlVilVTWSVLLSLFVTSVLLCFIFGQHLRQQR. Staunton, supra). 

Aiterhativeiy, membrane anchoring sequences include the GPI anchor, which results in a covalent 
bond between the molecule and the lipid bilayer via a glycosyl-phosphatidylinositol bond. The GPI 
anchor sequence Is exemplified by protein DAF, which comprises the sequence 
PNKGSGTTSGTTRLLSGHTCFTLTGLLGTLVTMGLLT, with the bolded serine the site of the anchor, 
(see IHomans. S.W. et al. (1988) Nature 333: 269-72, and Moran, P. et al. (1991) J. Bk)l. Chem. 266: 
1250-57). Adding GPI anchor sites Is accomplished by inserting the GPI sequence from Thy-1 In the 
carboxy terminal region relative the Inserted peptide of interest or randomized peptide. Thus, the GPI 
anchor sequences replaces the transmembrane domain In these constructs. 

Simliariy, acylatlon signals for attachment of lipid moieties can also serve as membrane anchoring 
sequences (see Stickney, J.T. (2001) Methods Enzymol. 332: 64-77). it is known that the 
myristyiation of c-src localizes the kinase to the plasma membrane. This property provides a simple 
and effective method of membrane localization given that the first 14 amino acMs of the protein are 
solely responsible for this function: MGSSKSKPKDPSQR (see Cross, F.R. et al. (1984) Moi. Cell. Biol. 
4: 1834-42; Spencer, D.M. et al. (1993) Science 262: 1019-24. both of which are hereby incorporated 
by reference) or MGQSLTTPLSL. The modlflcatton at the glycine residue (In bold) of the motif is 
effective In localizing reporter genes and can be used to anchor the zeta chain of the TCR. The 
myristyiation signal motif is placed at the amino end relative to the variable region (or protein of 
interest) in order to localize the construct to the plasma membrane. Another lipid modification is 
Isoprenold attachment, which includes the 15 cartDon famesyl or the 20 carbon geranyt-geranly group. 
The conserved sequence for Isoprenold attachment comprises CaaX motif with the cysteine residue 
as the lipid modified amino acid. The X residue determines the type of isoprenold modification. The 
preferred isoprenold Is geranyl-geranyi when X is a leucine or phenylalanine (Famsworth. C.C. et ai. 
(1994) Proc. Natl. Acad. Scl. USA 91: 11963-67). Famesyl Is the prefened lipid for a broader rarige of 
X amino acids such as methionine, serine, glutamlne and alanine. The "aa" in the isoprenold 
attachment motif are generally aliphatic residues, although other residues are also functional. 
Famesylation sequences include carisoxy tenninal SKDGKKKKKKSKTKCViM of K-Ra54B. Other 
isoprenoid attachment motifs are found In the C tenmini of N and H-Ras GTPases (Aronhelm, A., et ai. 
(1994) Cell 78: 949-61). Attachment of famesyl groups to various fonns of GFP provkies a useful 
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marker for monitoring cell membrane morphology and cell sorting by FACS. Moreover, cells retain the 
famesylated forms upon treating the cells with fixative while cytoplasmic fonms of GFP may leach out 
of the cell. 

In addition, localization to the cell membrane by lipid modification is also achieved by palmltoylation. 
Attachment of the palmitoyi group can be directed to either the amino or carboxy tennlnal region 
relative to the protein of interest In addition, multiple palmitoyi residues or combinations of palmitoyi 
and isoprenoids are possible. Amino tennlnal additions of palmitoyi group may use the sequence 
MVCCMRRTKQV from Gap43 protein while carboxy temninal modifications are possible witi^ 
CMSCKCVLKKKKKK from Ras mutant (modified amino acids in bold). Other palmitoylatlon 
sequences are found In G protein-coupled receptor kinase GRK6 sequence 
(LLQRLFSRQDCCGNCSDSEEELPTRL, Stoffel, R.H. etal. (1994) J. Biol. Chem. 269: 27791-94); 
rhodopsin (KQFRNCMLTSLCCGKNPLGD. Barnstable, C.J. et al. (1994) J. Mol. Neurosci. 5: 207-09); 
and the p21 H-ras 1 protein (LNPPDESGPGCMSCKCVLS, Capon, D.J. et al. (1983) Nature 302: 33- 
37). Use of the carboxy tenninal sequence LNPPDESGPGC(p)MSC(p)KC(f)VLS of H-Ras (modified 
amino acids In bold; p Is palmitoyi group and f Is famesyl group) allows attachment of both palmitoyi 
and famesyl lipids. 

In a prefened embodiment, ttie targeting sequence is a lysosomal targeting sequence, including, for 
example, a lysosomal degradation sequence such as Lamp-2 (KFERQ, Dice, J.F. (1992) Ann. N.Y. 
Acad. Sci. 674: 58-64); or lysosomal membrane sequences from Lamp-1 

(/lf/./PMGFF/^MGLW-/VLMYL /GRKRSHAGYQTI . Uthayakumar, S. et al. (1995) Cell. Mol. Biol. Res. 
41: 405-20) or Lamp-2 lLVPIAVGAALAGVULVLLAYFI GU(HHHAGyEQf, Koneckl, D.S. et al. (1994) 
Biochem. Blophys. Res. Comm. 205: 1-5; where Italicized residues comprise the transmembrane 
domains and underlined resklues comprise the cytoplasmic targeting signal). 

Altematively, the targeting sequence may be a mitochondrial localization sequence, including 
mitochondrial matrix sequences (e.g., yeast alcohol dehydrogenase III; 

MLRTSSLFTRRVQPSLFSRNILRLQST, Schatz. G. (1987) Eur. J. Bkx:hem. 165; 1-6); mitochondrial 
inner membrane sequences (yeast cytochrome c oxidase subunit IV; 

MLSLRQSIRFFKPATRTLCSSRYLL, Schatz, supra) : mitochondrial intemiembrane space sequences 
(yeast cytochrome c1 ; 

IVIFSMLSKRWAQRTLSKSFYSTATGAASKSGKLTQKLVTAGVAAAGITASTLLYADSLTAEAMTA.. 
Schatz, supra) or mitochondrial outer membrane sequences (yeast 70 kO outer membrane protein; 
MKSFITRNKTAILATVAATGTAIGAYYYYNQLQQQQQRGKK, Schatz. supra) . 

The target sequences may also be endoplasmic reticulum sequences, including the sequences from 
calreticulln (KDEL. Pelham, H.R. (1992) Royal Society London Transactions B; 1-10) or adenovirus 
E3/19K protein (LYLSRRSFIDEKKMP. Jackson. M.R. etal. (1990) EMBO J. 9: 3153-62). 
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Furthermore, targeting sequences also include peroxisome sequences (for example, the peroxisome 
matrix sequence of luclferase, SKL; Keller, GA et al. (1987) Proc. Natl. Acad. Sci. USA 4: 3264-68); 
or destruction sequences (cydin B1 , RTALGDIGN; Klotzbucher, A. et al. (1 996) EMBO J. 1 : 3053-64). 

In a prefenred emkxxllment, the targeting sequence is a secretory signal sequence capable of effecting 
the secretion of the peptide of interest or peptide candidate agent. There are a large number of 
l<nown secretory signal sequences which direct secretion of the peptide into the extracellular space 
when placed at the amino end relative to the peptide of interest Secretory signal sequences and their 
transferability to unrelated proteins are well known (see Silhavy, T.J. et ai. (1985) Microbiol. Rev. 49: 
398-41 8). Secretion of the peptide is particulariy useful to generate peptides capable of binding to the 
sur^be of, or affecting the physiology of target cells other than the host cell, e.g., the ceil infected with 
the retrovirus. In a preferred approach, a fusion product is configured to contain, in series, secretion 
signal peptide-presentation structure-randomized peptide region or protein of interest-presentation 
stmcture. In this manner, target cells grown In the vicinity of ceils expressing the library of peptides 
are exposed to the secreted peptide. Target ceils exhibiting a physiological change In response to the 
presence of the secreted peptide (i.e., by the peptide binding to a surfoce receptor or by being 
internalized and binding to Intracellular targets) and the peptide secreting cells are localized by any of 
a variety of selection schemes and the stmcture of the peptide effector identified. Exemplary effects 
include that of a designer cytoldne (I.e., a stem cell factor capable of causing hematopoietic stem cells 
to divide and maintain their totipotentlal), a factor causing cancer cells to undergo spontaneous 
apoptosis, a fectorthat binds to the cell sur^ce of target cells and labels them specifically, etc. 

Suitable secretory sequences are known, including signals ffX)m iL-2 (IVIYRMQLLSCIALSLALVTNS, 
Villinger, F. et al. (1995) J. Immunol. 155: 3946-54), growth honnone 

(MATGSRTSLLLAFGLLCLPWLQEGSAFPT. Roskam, W.G. et ai. (1979) Nudeic Adds Res. 7: 305- 
20); preproinsulln (MALWMRLLPLLALLALWGPDPAA AFVN. Bell, G.I. et ai. (1980) Nature 284: 26- 
32); and influenza HA protein (MKAKLLVLLYAFVAGDQI. Sekiwawa, K. et al. (1983) Pre>c. Natl. Acad. 
Scl. USA 80: 3563-67). with deavage between the nonunderilned-underiined junction. A particulariy 
prefenBd secretory signal sequence is the signal leader sequence from the secreted cytokine IL-4, 
MGLTSQLLPPLFFLLACAGNFVHG, which comprises the first 24 amino acids of lL-4. 

In a preferred embodiment, the fusion partner is a rescue sequence. A rescue sequence is a 
sequence which may be used to purify or isolate either the peptide of Interest or the candidate agent 
or the nuciek: acid encoding it. Thus, for example, peptide rescue sequences Include purification 
sequences such as the HiSg tag for use with Ni*^ affinity columns and epitope tags useful for detection, 
immunopredpitation or FAGS (fluorescence-activated ceil sorting). Suitable epitope tags Indude myc 
(for use with the commercially available 9E10 antibody), the BSP bfotinylation target sequence of the 
bacterial enzyme BIrA, flu tags, iacZ. GST, and Strep tag I and II.. 
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Alternatively, the rescue sequence may be a unique oligonucleotide sequence which serves as a 
prol^e target site to allow the quicic and easy isolation of the retroviral construct, via PGR, related 
techniques, or hybridization. 

In a preferred embodiment, the fusion partner Is a stability sequence that affects the stability of the 
peptide of interest or candidate bioactive agent In one aspect, the stability sequence confers stability 
to the peptide of Interest or candidate bioactive agent For example, peptides may be stabilized by the 
incorporation of glycines after the Initiating methionine (MG or MGG), for protection of the peptide to 
ubiqultination as per Varshavsky's N-End Rule, thus confening increased half-life in the cell (see 
Varshavsky. A: (1996) Proc. Natl. Acad. Scl. USA 93: 12142-49). Similarly, adding two prolines at the 
C-tenmlnus malces peptides that are largely resistant to cartx>xypeptidase action. The presence of two 
glycines prior to the prolines impart both flexibility and prevent staicture perturibing events In the dl- 
proilne from propagating Into the peptide structure. Thus, prefened stability sequences are 
IV1G(X)nGGPP, where X is any amino add and n is an integer of at least four. 

In another aspect, the stability sequence decreases the stability of the peptide of Interest or candidate 
bioactive agent Sequences, such as PEST sequences (i.e., polypeptide sequences enriched in 
proline (P), glutamic acid (E), serine (S) and threonine (T); see Rechstelner, M. (1996) Trends 
Biochem. Set 21: 267-71) and destmction boxes (Gbtzer, M. (1991) Nature 349 132-38) destabilize 
proteins by targeting proteins for degradation. For example, fusion of PEST sequences to GFP 
reporter protein decreases the half-life of GFP, thus providing an indicator of dynamic cellular 
processes, Including, but not limited to, regulated protein degradation, reporter for transcriptional 
activity, and cell cycle status (Mateus, C. et at (2000) Yeast 16: 1313-23; tt X. (1998) J. Blot Chem. 
273: 34970-75). Numerous PEST sequences useful for targeting peptides for degradation are known. 
These include amino acids 422-461 of ornithine decarboxylase (Corish, P. (1999) Protein Eng. 12: 
1035-40; ti, X, et at, US Pat No. 6,130,313) and the C temnlnal sequences of iKBa (Un, R. (1996) 
Mol. Cell Biol. 16: 1401-09). Destmction boxes found In cell cycle protein, for example cyclin B1 , can 
also reduce the half-life of fusion proteins but in a cell cycle dependent manner (Corish, P., supra). 

The fusion partners may be placed anywhere (I.e., N-tenninal, C-terminat intemal loops) In the 
structure as the biology and activity penmits. In addition, while the discussion has been directed to the 
fusion of fusion partners to the peptide or protein of interest of the fusion polypeptide, it is also 
possible to fuse one or more of these fusion partners to the rGFP or pGFP portion of the fusion 
polypeptide. Thus, for example, tiie rGFP or pGFP may contain a targeting sequence (either N- 
temilnal region, C-termlnal region, or intemal region, as described above) at one location, and a 
rescue sequence In the same place or a different place on the molecule. Thus, any combination of 
fusion partners, peptides of interest, and rGFP or pGFP proteins may be made. 

In a prefenred embodiment, the fusion partner includes a linker or spacer sequence, tinker 
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sequences between various targeting sequences {for example, membrane targeting sequences) and 
the other components of the constructs (such as the randomized peptides) may be desirable to allow 
the peptides to Interact with potential targets unhindered. For example, useful linkers Include glycine 
polymers (G)„. glycine-serine polymers (Including, for example, (GS)„, (GSGGS)„ and (GGGS)„. 
where n Is an Integer of at least one), glyclne^Ianine polymers, alanine-serine polymers, and other 
flexible linkers such as the tether for the stiaker potassium channel, and a large variety of other 
flexible linkers, as will be appreciated by those in the art. Glycine and glycine-serine polymers are 
preferred since both of these amino acids are relatively unstructured, and therefore may be able to 
serve as a neutral tetiier between components. Glycine polymers are tiie most preferred as glycine 
accesses significantiy more phi-psi space tiian even alanine, and is much less restricted than residues 
with'ibngerside chains (see Scheraga, H.A. (1992) Rev. Computational Chem. ill 73-142). Secondly, 
serine Is hydrophillc and therefore able to solubllize what could be a globular glycine chain. Third, 
similar chains have been shown to be effective in joining subunlts of recombinant proteins such as 
single chain antibodies. 

in a preferred eiiibodiment, the peptide Is connected to the rGFP or pGFP via linkers. That is, while 
one embodiment utilizes tiie direct linkage of the peptide of Interest to the rGFP or pGFP or of the 
peptide and any fusion partners to ttie rGFP or pGFP protein, a preferred embodiment utilizes linkers 
at one or both ends of the peptide. That Is. when attached etther to the N- or C-temriinus, one linker 
may be used. When tiie peptide of Interest Is Inserted In an internal position, as is generally outiined 
above, prefenBd embodiments utilize at least one linker and preferably two, one at each tenmlnus of 
the peptide. Linkers are generally preferred for confomiationally decoupling any insertion sequence 
(i.e., the peptide) from ttie scaffold structure Itself, to minimize local distortions in tiie scaffold stmcture 
that can either destabilize folding intemiedlates, or allow access to GFPs' buried tripeptide 
fluorophore, which decreases (or eliminates) rGFP or pGFP fluorescence due to exposure to 
exogenous colllslonal fluorescence quenchers (see Phillips, G.N. (1997) Curr. Opin Struct. Biol. 7: 
821-27, hereby incorporated by reference In Its entirety). 

Accordingly, as outiined below, when the peptides are Inserted Into Internal positions In the rGFP or 
pGFP protein, prefenred embodiments utilize linkers, and preferably (Gly)n linkers, where n Is 1 or 
mors, witti n being two. three, four, five and six, although linkers of 7-10 or more amino acids are also 
possible. Generally in this embodiment, no amino acids witfi P-cartx)ns are used in tiie linkers. 

in addition, the fusion partners, including presentation structures, may be modified, randomized, 
and/or matured to alter the presentation orientation of the randomized expression product. For 
example, determinants at tiie base of the loop may be modified to sllghtiy modify the internal loop 
peptide tertiary structure, to properly display the protein or peptide of interest. 

In a preferred embodiment, comjsinations of fusbn partners are used. Thus, for example, any number 
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of combinations of peptides of interest, presentation structures, targeting sequences, rescue 
sequences, and stabiiity sequences may be used, with or witiiout linker sequences. As wiil be 
appreciated by those In me art, using a base vector tiiat contains a cloning site for Inserting various 
peptides, a person sl^llled in the art can cassette In various fusion partners, in addition, as discussed 
herein, it is possible to have more than one peptide of interest in a construct, either together to form a 
new suriiace or to bring two other molecules together. Similariy, as described below, it is possible to 
have peptides inserted at two or more diffiBrent loops of the rGFP or pGFP protein, preferably but not 
required to be on the same fiace' of the GFP protein. 

In view of the foregoing, the present invention further relates to fusion nucleic acids for encoding and 
and expressing the proteins described above. By "fusion nucleic acid" herein is meant a plurality of 
nucleic acid components that are joined together, either directly or indirectly. As will be appreciated by 
those in the art, in some embodiments the sequences described herein may be DNA, for example 
when extrachromosomal plasmlds are used, or RNA when retroviral vectors are used. In some 
embodiments, the sequences ars directly linked together without any linking sequences while in other 
embodiments linkers such as restriction endonuclease cloning sites, linkers encoding flexible amino 
acids, such as giydne or serine linkers such as known In the art. are used, as discussed above. In 
addition, the fusion nucleic adds may further comprise substitutions to codon optimize the nucleic acid 
for expression of the encoded proteins in a partk:ular target organism. 

To facilitate the generation effusion polypeptides comprising rGFP or pGFP. the present invention 
further provides for rGFP or pGFP fusion nudeic acids with multiple doning site (MCS) inserted into 
the rGFP or pGFP nucleic add sequences at about the amino terminal region, the cartx)xy tenmlnal 
region, or at least one loop as oultined above, or combinations thereof. The presence of an MCS 
facilitates generation of fusion constructs, including cDNA, genomic DNA, and random peptide fusion 
libraries. When the MCS site is at the amino temr)inal region, the MCS may contain Its own translation 
Initiation sequence to regulate translation of inserted nudeic adds lacking its own translation initiation 
sites (e.g., random peptide sequences). Aitematlvely, when the MCS Is present downstream of the 
Initiating amino add (i.e., methtonine) near the amino temiinal region, or at the carboxy tenmlnal or 
Intemal loops of rGFP or pGFP, the translation initiation sequences of rGFP or pGFP are generally 
used. 

In the present Invention, the fusion nucleic adds further comprise expression vectors for expressing 
the proteins of the present invention. The expressbn vectors may be either self-replicating 
extrachromosomal vectors or vectors which Integrate Into a host genome. Generally, these 
expression vectors include control sequences operably linked to the nucleic acid encoding the protein. 
The tenn "control sequences" refers to DNA sequences necessary for the expression of an operably 
linked coding sequence in a particular host organism. Thus, control sequences include sequences 
required for transcription and translatton of the nudeic adds, which are selected in reference to the 
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target organism used for expressing the proteins. For example, for prolcaryotes, the sequences 
Include a promoter, optionally an operator sequence, and a ribosome binding site. Eulcaryotic cells are 
known to utilize promoters, poiyadenyiatlon signals, and enhancers. 

Nucleic acid is "operably linked" when it is placed into a functional relatbnshlp with another nucleic 
add sequence. In the present context, operably linked means that the control sequences, such as 
transcription and translation regulatory sequences, are positioned relative to the coding sequence In 
such a manner that expression of the encoded protein occurs. For example, a promoter or enhancer 
is operably linked to a coding sequence if it affects the transcription of the sequence; or a ribosome 
binding site Is operably linked to a coding sequence if It is positioned so as to fiacilltate translation. 
Where the fusion nucleic acid encodes a fusion protein, for example a protein linked to a secretory 
leader sequence, the DNA for the secretory leader is operably linked to DNA for a polypeptide if it is 
expressed in a manner resulting In secretion of the polypeptide. 

In general, tt)e transcriptional and translational regulatory sequences may include, but are not limited 
to, promoter sequences, enhancer or transcriptional activator sequences, rit>osomal binding sites, 
CAP sequences, transcriptional start and stop sequences, and translational start and stop sequences, 
in a preferred embodiment, the regulatory sequences include a promoter and transcriptional start and 
stop sequences. 

Promoter sequences are either constitutive or inducible promoters. By "promoter^ herein is meant 
nucleic acid sequences capable of initiating transcription of the fusion nucjelc acid or portions tiiereof. 
Promoters may be constitutive wherein tiie transcription level Is constant and unaffected by 
modulators of promoter activity. Promoter may be Inducible in that promoter activity is capable of 
being increased or decreased, for example as measured by \he presence or quantitation of transcripts 
or translation products (see Walter, W. et al. (1996) J. Mol. Med. 74: 379-92). Promoters may also be 
cell specific wherein tiie promoter is active only in particular cell types. Thus, promoter as defined 
herein Includes sequences required for initiating and regulating the tifanscription level and transcription 
In specific cell types. Furthennore, tiie promoters may be either naturally occum'ng promoters, hybrid 
promoters which combine elements of more tiian one promoter, or synthetic promoters based on 
consensus sequence of known promoters. 

The fusion nucleic add comprising the expresston vedor may comprise additional elements. For 
example, tiie expression vector may have two replication systems, tiius allowing It to be maintained in 
two organisms, for example In mammalian or insect cells for expression and in a prokaryotic host for 
cloning and amplification. Furthermore, for Integrating Into the host chromosomal elements, tiie 
expression vedor may contain sequences necessary for the integration process. The Integration 
sequences used will depend on tiie Integration mechanism. For homdogous recombination, a 
sequence homologous to specific regions of a host cell genome Is Incorporated Into tiie fusion nucleic. 
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as Is well known In the art. Preferably two homologous sequences flank the expression construct or 
the region to be inserted Into the genome. By selecting the appropriate homologous sequence, the 
vector may be directed to specific regions of the host cell genome. Alternatively, Integration is directed 
by inclusion of sequences necessary for site specific recombination. A variety of site specific 
recombination systems are known. The cre-lox system comprises the Cre recombinase of 
bacteriophage P1. which catalyzes recombination between short 34 basepair lox-P sites. Presence of 
lox-P sites on two different DNAs results In recombination between the two lox-P sites, thus 
generating a single recombinant containing two lox-P sites flanking the Integrated DIMA (see for 
example, Fukushige, S. et al. (1992) Proc. Nati. Acad. Sd. USA 89: 7905-09). Cre-lox recombinations 
can function In any cell system containing lox-P sHes and Cre recombinase. Insertion of lox-P sites 
Into the genome of organisms and expression of Cre albws for recombination events In bacterial, 
yeast, plant, and mammalian cells (Sauer, B. (1996) Nucleic Acids Res. 24: 4608-13; Araki, K. etal. 
(1997) Nucleic Acids Res. 25: 868-72; and Vergunst, A.C. (1998) Plant Moi. Biol. 38: 393-406; US Pat 
No 4.959.317). 

Other systems applicable for integrating the expression vectors Include, but are not limited to, the ffp 
recombinase system (see for example, US Pat. No. 6,140,129), the X integrase system, 
bacteriophage phage IVIu, transposon systems (e.g., yb), retix)viral vectors, and the like. As some of 
the integration mechanisms function only in certain organisms, the appropriate integration system is 
selected according to Uie cells in which the expresston vectors are used, as Is well known in the art. 

In another preferred embodiment, the site-specific recombination sites are not used for integration but 
for deletion or rearrangement of nucleic acid sequences on the fusion nucleic acid. Suitable site 
specific recombination sequences include cre-lox and flp. Reanrangements may occur for fusion 
nucleic acids present extrachromosomaily or for fliston nucleic adds integrated into the host 
chromosome. Generally, ttie site-specific recombination sequences flank the nucleic acid sequences 
selected for deletion or rearrangement That is, a first site-specific sequence is present 5' and a 
second site specific sequence is present 3' of tiie sequence to be deleted or reanranged. Thus, the 
sites may flank promoter or promoter controlling elements, genes of interest splicing sequences, 
translational contnolling elements, or combinations thereof. Whether the site specific recombination 
sequences lead to deletion or rearrangement generally depend on the orientation of tiie recombination 
sites. Placement of flp or loxP sites in head-to-head orientation (I.e., inverted repeat) results in 
inversion of the Interiying DNA while placement in head-to-tail orientation (i.e., direct repeat) results in 
excision of the interiying DNA. These features may be useful in several situations, for example, when 
It is desirable to activate expression of ttie rGFP or pGFP fusion polypeptide in specific cells, tissues, 
or at specific periods, especially at specific times in cellular development. To achieve tills effect, a 
rGFP or pGFP fusion nucleic acid, flanked by /oxPor flp sites placed in inverse repeat orientation, is 
linked in a reverse orientation relative to a promoter such that transcription results in generation of an 
antlsense strand rather than the sense strand of ttie fusion nuclek: acid encoding ttie fusion 
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polypeptide, thus resulting In absence of rGFP or pGFP protein. To properly express rGFP or pGFP 
protein In these cells, the recombinase Is expressed In these ceils, either by transfectlon or by inducing 
expression of an endogenous copy of the recombinase, which results In Inversion of the rGFP or 
pGFP relative to the promoter. This reanrangement places the gene In proper orientation for synthesis 
of the sense strand that leads to expression of the protein. 

In a preferred embodiment, the expression vector also contains a selectable marker gene to allow the 
selection of transformed host cells. Generally, the selection will confer a detectable phenotype that 
provides a way of differentiating between cells that express and do not express the selection gene. 
Selection genes are well known in the art and will vary with the host cell used, as further described 
below. 

In accordance with the foregoing, a variety of expression vectors are used to express the nucleic acids 
encoding the proteins of the present Invention. As used herein, the tern) "vector" includes plasmids, 
cosmids, artificial chromosomes, viruses, and the like. In one prefenBd embodiment, the expression 
vectors are bacterial expression vectors including vectors for Bacillus subtllls, E. coll, Haemophilus, 
Streptococcus cremoris, and Streptococcus llvidans, among others. These vectors are well known In 
the art. A suitable bacterial promoter is any nucleic acid sequence capable of binding bacterial RfNiA 
polymerase and initiating the downstream (3*) transcription of the coding sequence of the fusion 
protein Into mRNA. A bacterial promoter has a transcription initiation region which Is usually placed 
proximal to the 5* end of the coding sequence. This transcription initiation region typically includes an 
RNA polymerase binding site and a transcription initiation site. Sequences encoding metabolic 
pathway enzymes provide partlculariy useful promoter sequences. Examples Include promoter ' 
sequences derived from sugar metabolizing enzymes, such as galactose, lactose and maltose, and 
sequences derived from blosynthetic enzymes such as tryptophan. Promoters from bacteriophage 
(e.g., pL) may also be used and are known in the art. In addition, synthetic promoters and hyt>rid 
promoters are also useful; for example, the tac promoter is a hybrid of the trp and lac promoter 
sequences. Furthemiore, a bacterial promoter can Include naturally occum'ng promoters of non- 
bacterial origin that have the ability to bind bacterial RNA polymerase and Initiate transcription. 

in addition to a functioning promoter sequence, an efficient ribosome binding site is desirable, in E. 
CO//, the ribosome binding site is the Shine-Delgamo (SD) sequence and includes an initiation codon 
and a sequence 3-9 nucleotides In length located 3*11 nucleotides upstream of the initiatton codon. 

The expression vector may also Include a signal peptide sequence that provides for secretion of the 
fusion protein In bacteria. The signal sequence typically encodes a signal peptide comprised of 
hydrophobic amino acids which direct the secretion of the protein from the cell, as is well known In the 
art. The protein is either secreted into the growth media (gram-positive bacteria) or Into the 
periplasmlc space, located between the Inner and outer membrane of the cell (gram-negative 
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bacteria). 

The bacterial expression vector may also Include a selectable marker gene to allow for the selection of 
bacterial strains that have been transformed. Suitable selection genes include genes which render the 
bacteria resistant to drugs such as amplcillln. chloramphenicol, erythromycin, kanamycfn, neomycin 
and tetracycline. Selectable markers also Include blosynthetic genes, such as those in the histtdlne, 
tryptophan and leucine blosynthetic pathways. These components are assembled into expression 
vectors and introduced In bacterial host cells, using techniques well known in the art (e.g., calcium 
chloride treatment, electroporation, etc.). 

in another preferred embodiment, the expression vectors are used to express the proteins In yeast 
ceils. Yeast expression systems are well known In the art, and include expression vectors for 
Saccharomyces cerevisiae, Candida albicans and C. maltosa, Hansenula polymorpha, Kiuyveromyces • 
fragilis ar)6 K. lactis, Piclila guillerimondil ar}6 P. pastorls, Schlzosaccharomyces pombe, and Yarrowia 
llpolytica. Preferred promoter sequences for expression in yeast include the Inducible GAL promoters 
{e.g. GAL 1. GAL 4, GAL 10 eta), the promoters from alcohol dehydrogenase (ADH or ADC1), * 
enolase. giucoklnase, glucose-6-phosphate isomerase, glyceraldehyde-3-phosphate-dehydrogenase, 
hexokinase, phosphofructokinase, 3-phosphoglycerate mutase, pymvate kinase, fructose 
bisphosphate, acid phosphatase gene, tryptophase synthase (TRP5) and copper inducible CUP1 
promoter. Any plasmid containing a yeast compatible promoter, an origin of replicatkDn, and 
termination sequences Is suiteable 

Yeast selectable markers Include genes complementing mutations ADE2, HIS4, LEU2, TRP1, URA3, 
and genes conferring resistance to tunlcamycin (ALG7 gene), G418 (neomycin phosphotransferase 
gene), growth in presence of copper ions (metallothlonein CUP1 gene), resistance to fiuoroacetate, 
(fluoroacetate dehabgenase), or resistance to formaldehyde (fonnaldehyde dehydrogenase). 

in another preferred emt)odiment, the expression vectors are used for expression In plants. Plant 
expression vectors are well known in the art. Vectors are known for expressing genes in Arabidopsis 
thallana, tobacco, carrot, and maize and rice cells. Suitable promoters for use in plants Include those 
of plant or viral origin, including, but not limited to CaMV 358 promoter (active in both monocots and 
dicots, Chapman, S. et al. (1992) Plant J. 2, 549-557) nopoline promoter, mannopine synthase 
promoter, soybean or Arabldopsis thaliana heat shock promoters, tobacco mosaic virus prorrrater 
(Takmatsu, et al. (1987) EMBO J. 6: 307), AT2S promoters of Arabldopsis thaliana (i.e.. PAT2S1, 
PATS2, PATS3 etc.). In another prefened embodiments, the promoters are tissue specific promoters 
active in specific plant tissues or cell types (e.g., roots, leaves, shoot meristem etc.), which are well 
known in tiie art. Alternatively, the expression vectors comprise recombinant plasmid expression 
vectors based on Ti plasmlds or root inducing plasmlds. 
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In another aspect, regulatory sequences Include "enhancers* to regulate expression. Preferably these 
are of plant, bacterial (e.g. Agrobacterlum), viral origin which are specific to plants. The enhancers 
may act at either the transcriptional or translatlonal level. The fusion nucleic acids may also comprise 
one or more Introns, preferably of plant origin, to Increase the efficiency of expression of the fusion 
nucleic add. For example, insertion of an Intron into the 5' untranslated sequence of a gene (e.g., 
between sit of transcription Initiation and translation Initiation) leads to increased stability of the 
messenger RNA. The Intron Is prefenrably, though not necessarily, the first Intron. 

Optionally, a selectable mariner gene is used with the expression vectors. The mari<er may be an drug 
resistance gene, a herisidde resistance gene, or any other selectable maricer ttiat can be used for 
selecting cells containing the vector. Suitable plant mariners include adenosine deaminase, 
dihydrofolate reductase, hygromycin transferase, bar gene (Lohar, D.P. (2001) J. Exp. Bot. 52: 1697- 
702). green fluorescent proteins (including rGFP and pGFPs of the present Invention), amino- 
glycoside 3-O-phosphotransferase II (I.e.. kanamydn, neomycin, and G418 resistance). 

in addition, ttie plant expression vectors may comprise plant speclftc targeting sequences in addition 
to \he targeting sequences described above. In one aspect, the sequences are chloroplast or 
mitochondrial targeting sequences. An example of a chloroplast targeting signals is the small subunit 
of ribulose 1 ,5 diphosphate of Pisum sativum. For a mitochondrial targeting sequence, an example Is 
ttie precursor of tiie beta subunit of mitochondrial ATPase F1 or Nicotlana plumbaginlfdia. In another 
aspect, the targeting signal comprises a vacuolar targeting sequences or "propeptide". These 
sequences target the proteins to vacuoles of aqueous tissues, including leaves or protein bodies of 
storage to'ssues (Neuhaus. J.M et al. (1991) Proc. Nati. Acad. Sci. USA 88: 10362-66; Sebastian!. F.L. 
et al. (1991) Eur J. Biochem. 199: 441-50). 

In another prefen^d embodiment, the expression vectors are used to express the proteins and nudeic 
acids of tile present Invention In Insects and insect cells. In one prefemed embodiment, fusion 
proteins are produced In insect cells. Expression vectors for ttie transfomnation of insect cells, and In 
particular, baculovinjs vectors used to create recombinant baculoviruses for expressing foreign genes, 
are well known In ttie art (see for example, O'Reilly, D.R. et al. "Baculovlrus Expression Vectors: A 
Laboratory Manual," W.H. Freeman & Co, New Yoric. 1992). By "baculovirus" or "nuclear polyhedrosis 
viruses" as used herein is meant expression systems using viruses classified under the family of 
baculovlridae, preferably subgroup A. In prefened embodiments, ttiese Include systems specific for 
Bomblx, Autogrephica, and Spodoptera.(see for example, US Pat. No. 5,194,376). Ottier expression 
systems indude Amsacta moorei entomopoxvlrus (AmEPV). Aedes aegypti desonucleosis (Aedes 
DNV. US Pat. No. 5,849.523), and Galleria mellonella densovlms (GmDNV, Tal, et al. (1993) Arch. 
Insect Biochem. Physiol. 22: 345-356). In anotiier prefened embodiment, expression vectors 
comprise fusion nucleic adds that Integrate into the host chromosome. This may be achieved by 
homologous recombination, particulariy modified homologous recombination techniques when the . 
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Insect cells or Insect do not readily undergo homologous recombination (see Rong, Y.S. (2000) 
Science 288: 2013-18). site directed recombination (e.g., cre-lox), and transposon mediated 
integration (e.g., P-element transposition elements, ). 

Promoters suitable for controlling expression in insects Include Autographs califbrnica nuclear 
polyhdrosis virus polyhedrin promoter, heat shock promoter (e.g.. hsp 70). tubulin promoter. 
p10 promoter. Aedes DNV viral p7 and p61 promoters, in one preferred embodiment, the promoter 
allows expression at an early stage in viral Infection and/or allows expression in substantially all 
tissues of an insect In another prefen-ed embodiment, the promoter is a ceil specific and 
developmental stage specific promoter, many of which are well loiown In the art As used herein, 
-aeveiopmental specific promoters are promoters that are active at only certain stages in insect 
development, for example, embryonic, larval, pupal, and adult stages. An example of a 
developmental stage specific promoter Is the ecdysone regulated promoters that are active during 
molting and larval/pupal stages because of Increases in the steroid homione ecdysone during these 
developmental periods. Cell specific promoters include promoters active In the nervous system (e.g., 
ELAV). Imaginal discs, gut. maiphigian tubules, antennae (e.g.. odor binding protein gene promoter), 
etc. 



Although mammalian targeting sequences function in insect cells, targeting sequences derived from 
insect genes are prefen-ed under some circumstances, for example to efficiently express secreted or 
membrane bound proteins in insect ceils. Signal sequence include Manduca sexta AKH signal 
peptide sequence, Drosophiia cuticle protein signal peptides (e.g., CP1. CP2. CP3 and CP4, U.S. Pat 
No. 5,278,050). and honey bee meiiltin excretion peptide (MKFLVDVALVFiVIWYISYIYA). 

In a preferred embodiment, the expression vectors are used for expression in animals, espedatly 
mammals. A variety of expression vectors are known for expressing proteins in animal cells, Indbding 
fusion nucleic acids existing extrachromosomally, as Integrants In the host chromosome, or as viral 
nucleic adds. Viral vectors may be based on adenoviral, lentivlral, aiphaviral, poxvirus (vacdnia 
virus), or retroviral vectors. In a preferred embodiment the viral expression vector system Is a 
retroviral vector such as is generally described in PCTAJS97/01019 and PCT/US97/01048. both of 
which are hereby expressly Incorporated by reference. 

By "retroviral vectors" herein is meant vectors used to introduce into appropriate hosts the nudeic 
acids of the present invention in the fomi of a RNA viral particle. A variety of retroviral vectora are 
known in tiie art. Prefened retroviral vectors indude a vector based on the murine stem cell virus 
(IWSCV) (Hawley, R.G. et ai. (1994) Gene Ther. 1: 136-38) and a modified MFG virus (Rivleri. i. et ai. 
(1995) Genetics 92; 6733^7). and pBABE (see PCT US97/01019). in addition, particulariy well suited 
retroviral transfection systems for generating retroviral vectors are described In Mann et al., suora: 
Pear. W.S. et al. (1993) Proc. Natl. Acad. Sd. USA 90: 8392-96; Kitamura. T. et at (1995) Proc Natl. 
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Acad. Sd. USA 92: 9146-50; KInsella, T.M. et al. (1996) Hum. Gene Ther. 7: 1405-13; Hofmann. A. et 
al. (1996) Proc. Natl. Acad. Scl. USA 93: 5185-90; Choate, KA et al. (1996) Hum. Gene Then 7: 
2247-53; WO 94/19478; PCT US97/01019, and references cited therein, all of which are incorporated 
by reference. Other suitable retroviral vectors Include, among others, LRCX retroviral vector set; pSIR 
retroviral vector; pLEGFP-NI retroviral vector, pLAPSN retroviral vector; pLXIN retroviral vector; 
pLXSN retroviral vector; all of which are commercially available (e.g., Clontech). Generally, the 
retroviral vectors described above are used to express the nucleic acids of the present invention in 
proliferating cells. When target ceils are non-proliferating (e.g., brain cells), useful viral vectors are 
derived from lentivlmses (Miyoshi, H. et al. (1998) J. Virol. 72: 8150-57). adenoviruses (Zheng, C. et 
al. (2000) Nat Blotechnol. 18: 176-80) or alphaviruses (Ehrengruber, M.U. (1999) Proc. Natl. Acad. 
Sd. USA 96: 7041-46). In addition, the retroviral vectors may Incorporate the self-inactivating (SIN) 
feature of 3' LTR enhancer/promoter to Inactivate viral promoters upon Integration, which allows use of 
other promoters for regulating expression of the fusion nucleic acid. It is possible to configure these 
SIN retroviral vectors to pemnit inducible expression of retroviral inserts after integration of a single 
vector into a target cell (Hoflinan, et al. (1996) Proc. Natl. Acad. Scl. USA 93: 5185). 

The mammalian vectors may include Inducible and constitutive promoters for expressing the genes of 
interest encoding the polypeptides of the present invention. A mammalian promoter will have a 
transcription Initiating region, generally located 5' to tiie start of the coding region, and a TATA box, 
present at about 25-30 basepairs upstream of the transcription initiation site. The promoter will also 
contain upstream regulatory elements .that control the rate and initiation of transcription, including 
CAAT and GO box, enhancer sequences, and repressor/sllencer sequences (see for example, Chang 
BD (1996) Gene 183: 137-42) . These promoter controlling elements may act directionally, requiring 
placement upstream of the promoter region, or act non-directionaily. These aforementioned 
transcriptional control sequences may be provided from non-viral or viral sources. Commonly used 
promoters and enhancers are from viral sources since tiie viral genes have a broad host range and 
produce high expression rates. Viral promoters, including upstream controlling sequences, may be 
from polyoma virus, adenovirus 2. simian virus 40 (earty and late promoters), and herpes simplex 
virus (e.g., HSV thymidine kinase promoter), human cytomegalovirus promoter (CMV), and mouse 
mammary tumor virus (i\/1MTV-LTR) promoter. A variety of non-viral promoters with constitutive, 
Indudble, cell specific, or developmental stage specific activities are also well Icnown In tiie art (e.g., ^ 
globin promoter, mammalian heat shocic promoter, metellottiionein, ublquitin C pronrx)ters, EF-1alpha 
promoters, etc.). Cell specific promoters, which ara well known in the art, include promoters active in 
specific celts including, but not limited to brain, olfactory bulb, thyroid, lung, muscle, pancreas, liver, 
lung, heart, breast, prostate, kidney, etc. Promoters and promoter controlling elements are chosen 
based on the desired level of promoter activity and the ceil type in which the proteins of the present 
invention are to be expressed. 

Generally, tiie mammalian vectors also Include selecteble mari<er genes. Suitable marker genes 
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Include reporter or selection genes as further described below. Selection genes include, but are not 
limited to neomycin, blastocidin, bleomycin, puromycin, hygromycin, and multiple drug resistance 
(IVIDR) genes. Suitable reporter genes include fluorescent proteins (e.g., green fluorescent proteins, 
ludferases) enzymatic markers (e.g., p-galactosidase, glucouronidase, alkaline phosphatase etc.), 
and surfoce proteins (e.g. CDS). 

Additional sequences in the expression vectors include splice sites for proper expression, 
polyadenylatlon signals, 5* CAP sequence, transcription temninatlon sequences, and the like. 
Typically, transcription termination and polyadenylatlon sequences recognized by mammalian cells are 
regulatory regions located 3' to the translation stop codon and thus, together with the promoter 
elements, flank the coding sequence. The 3* temninus of the mature mRNA is formed by site-specific 
post-transcriptional cleavage and polyadenylatlon. Examples of transcription terminator and 
polyadenylatlon signals Include those derived from SV40. 

Other sequences may include centromere sequences for generating human artificial chromosomes 
(HACs) for delivering larger fragments of DNA than can be contained and expressed In a plasmid or 
viral vector. HACs of 6 to 10l\/l are constructed and delivered via conventional delivery methods 
(liposomes, polycationic amino polymers, or vesicles) for therapeutic purposes. The choice and 
design of an appropriate vector is within the ability and discretion of one of ordinary skill In the art 

In a further prefen'ed embodiment, the fusion nudeic acids of the present invention may comprise a 
first gene of Interest, a separation sequence, and a second gene of interest In a prefened 
embodiment at least one of the gene of Interest is a rGFP or pGFP or tiieir variants, or a rGFP or 
pGFP fusion polypeptide described above. By "gene of Interesf herein is meant any nudeic add 
sequence capable of encoding a "protein of interest' or a "protein," as defined below. However, in 
some embodiments, the "gene of interesf encompasses a nudeic acid sequence element that does 
not encode a protein. These elements may Indude. but are not limited to. promoter/enhancer 
elements, chromatin organizing sequences, ribosome binding sequences, mRNA splk:ing sequences, 
multiple doning sites, etc. 

In a preferred embodiment the gene of interest is a reporter gene. By "reporter gene" or "seledlon 
gene' or grammatical equivalents herein is meant a gene that by its presence in a cell (l-o., upon 
expressbn) allows the cell to be distinguished from a cell that does not contain the reporter gene. 
Reporter genes can be dassified into several different types, including detection genes, survival 
genes, death genes, cell cycle genes, cellular biosensors, proteins producing a dominant cellular 
phenotype, and oondltbnal gene produds. In the present invention, expression of the protein product 
causes the effect distinguishing between cells expressing ttie reporter gene and those tfiat do not As 
Is more fully outlined below, additional components, such as substrates, ligands. etc., may be 
additionally added to allow selection or sorting on ttie basis of tiie reporter gene. 
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In a preferred embodiment, the first and second gene of Interest encode the same rGFP or pGFP. 
These constructs allow Increased expression of the GFP molecule or GPP fusion polypeptide since 
two copies of the same gene are expressed in a single transcriptional event. The presence of a . 
separation sequence allows the synthesis of separate fluorescent proteins, thus obviating any 
detrimental effect that might arise from fusing two reporter proteins to each other. Synthesizing high 
levels of encoded protein is desirable when needed to produce a cellular phenotype, lor example 
when expressing a random peptide fused to rGFP or pGFP. Similarly, for example when screening for 
promoter regulators, signal amplification may be accomplished by expressing two Identical rGFP or 
pGFP reporter genes. 

In another preferred embodiment, the gene of Interest comprises a reporter gene distinguishable from 
rGFP or pGFP. Expressing two distinguishable, separate reporter proteins allows targeting of 
individual reporter proteins to distinct cellular locations, provides Increased discdmlnation of cells 
expressing the fusion nucleic acid, and affords a basis for monitoring expression of the other reporter 
gene. 

In a preferred embodiment, the distinguishable reporter gene comprises a protein that can be used as 
a direct label, for example a detection gene for sorting the cells or for cell enrichment by FACS. In this 
embodiment, the protein product of the reporter gene Itself can serve to distinguish ceils that are 
expressing the reporter gene. In one aspect, suitable reporter genes Include distinguishable wildtype 
and variant Ibnns of Renllla renlformls GFP, Rilosarcus gumeyi GFP, and Renilla muelleri GFP. In 
another aspect, the reporter gene comprises other fluorescent proteins, such as Aequona victoria 
GFP (Chalfie, M. et al. (1994) Science 263: 802-05), EGFP; Clontech - Genbank Accession Number 
U55762 ), blue fluorescent protein (BFP; Quantum Biotechnologies, Ina 1801 de Malsonneuve Qlvd. 
West, 8th Floor, Montreal (Quebec) Canada H3H 1J9; Stauber, R. H. (1998) Blotechniques 24: 462- 
71 ; Helm, R. et al. (1996) Cun*. Biol. 6: 178-82), enhanced yellow fluorescent protein (EYFP; 1. 
Clontech Laboratories, Inc., 1020 East Meadow Circle, Palo Alto, CA 94303), Anemonia majano 
fluorescent protein (amFP486, Matz, M.V. (1999) Nat. Biotech. 17: 969-73), 2oanf/7U5 fluorescent 
proteins (zFP506, 2FP538; Matz, supra), Discosoma fluorescent protein (dsFP483, drFP583; Matz, 
supra), and Clavularfa fluorescent protein (cFP484; Matz, supra). Other suitable reporter genes 
include, among others, luciferases (for example, firefly, Kennedy. H.J. et al. (1999) J. Biol. Chem. 274: 
13281-91; Reniiia reniformis, Lorenz. W.W. (1996) J Blolumin. Chemilumin. 11: 31-37; Renilla 
muelleri, U.S. Patent No. 6.232,107), p-galactosidase (Nolan, G. etal. (1988) Proc. Natl. Acad. Sci. 
USA 85: 2603^7). p-glucouronidase (Jefferson. R.A. et al. (1987) EMBO J; 6: 3901-07; Gallager. S.. 
"GUS Protocols: Using the GUS Gene as a reporter of gene expression," Academic Press, Inc., 1992), 
horseradish peroxidase, alkaline phosphatase, and SEAP (I.e., the secreted fonn of human placental 
alkaline phosphatase; Cullen, B.R. etal. (1992) Methods Enzymol. 216: 362-68). 

In another embodiment, the reporter gene encodes a protein that will bind a label that can be used as 
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the basis of the cell enrichment (sorting); that Is, the reporter gene senses as an Indirect label or 
detection gene. In a prefenred embodiment, the reporter gene encodes a cell-^surfaoe protein. For 
example, the reporter gene may be any cell-surface protein not nonnally expressed on the surface of 
the cell, such that secondary binding agents serve to distinguish cells that contain the reporter gene 
from those that do not Altematively, albeit non-preferably, reporters comprising normally expressed 
cell-surface proteins could be used, and differences between cells containing the reporter construct 
and those without could be determined. Thus, secondary binding agents bind to the reporter protein. 
These secondary binding agents are preferably labeled, for example with fluors, and can be 
antibodies, haptens, etc. For example, ftuorescentty labeled antibodies to the reporter gene can be 
used as the label. Similarly, membrane-tethered streptavidin could serve as a reporter gene, and 
fluorescently-labeled blotin could be used as the label, 1.e., the secondary binding agent. Altematively, 
the secondary binding agents need not be labeled as long as the secondary binding agent can be 
iised to distinguish the cells containing the constaict; for example, the secondary binding agents may 
be used in a column, and the ceils passed through, such that the expression of the reporter gene 
results in the cell being bound to the column, and a lack of the reporter gene (i.e. inhibition), results In 
the cells not being retained on the column. Other suitable reporter proteins/secondary labels Include, 
but are not limited to, antigens and antibodies, enzymes and substrates (or Inhibitors), etc. 

In a preferred embodiment, the reporter gene comprises a survival gene that senses to provide a . 
nucleic add without which the cell cannot survive, such as drug resistance genes. In tills 
embodiment, expressing the survival gene allows selection of cells expressing the fusion nucleic acid 
by identifying cells ttiat survive, for example in presence of a selection compound. Examples of drug 
resistance genes include, but are not limited to, puromycln resistance (puromycln-N-acetyl- 
transferase) (de la Luna, S. et al. (1992) Methods Enzymol. 216: 376-85), G418 neomycin resistance 
gene, hygromycin resistance gene (hph). and blastiddine resistance genes (bsr, brs, and BSD; Pere- 
Gonzalez. etal.(1990) Gene, 86: 129-34; IzumI, M. et al. (1991) Exp. Cell Res. 197: 229-33; Itaya, M. 
et al. (1990) J. Blochem. 107: 799-801; Kimura, M. et al. (1994) Mol. Gen. Genet. 242: 121-29). In 
addition, generally applicable survival genes are the femily of ATP-bindIng cassette transporters, 
induding multiple drug resistance gene (MDR1) (see Kane. S.E. et al. (1988) Mol. Cell. Biol. 8: 3316- 
21 and Choi, et al. (1988) Cell 53: 519-29), multf-dmg resistance associated proteins (MRP) 
(Bera, T.K. et al. (2001) Mol. Med. 7: 509-16), and breast cancer associated protein (BCRP or MXR) 
(Tan. B. etal. (2000) Cun*. Opin. Oncol. 12: 450-58). When expressed in cells, these selectable 
transporter genes can confer resistance to a variety of toxic reagents, espedally anti-cancer drug? 
(I.e.. methotrexate, colchidne, tanfx>xifen. mitc»(anttirone, and doxomblcin). As will be appreciated by 
those sidlled In the art, tiie choice of ttie sefection/survivat gene will depend on the host cell type used. 

In a preferred embodiment, tiie reporter gene comprises a deaUi gene that causes tiie cells to die 
when expressed. Death genes fall Into two basic categories: death genes that encode deatti proteins 
requiring a death ligand to kill the cells, and deatii genes that encode death proteins ttiat kill cells as a 
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result of high expression within the cell and do not require the addition of any death ligand. Prefenred 
are cell death mechanisms that require a two-step process: the expression of the death gene and 
Induction of the death phenotype with a signal or Hgand such that the cells may be grown expressing 
the death gene, and then induced to die. A number of death genes/llgand pairs are loiown, Including, 
but not limited to, the Fas receptor and Fas ligand (Schneider. P. et al. (1997) J. Biol. Chem. 272: 
18827-33; Gonzalez-Cuadrado, S. etal. (1997) Kidney Int. 51: 1739-46; Muruve. DA etal. (1997) 
Hum. Gene Ther. 8: 955-63); p450 and cyclophosphamide (Chen, L. et al. (1997) Cancer Res. 57: 
4830-37); thymidine l<lnase and gangcylovlr (Stone, R. (1992) Science 256: 1513); and tumor necrosis 
factor (TNF) receptor and TNF. 

When death genes requiring llgands are used, prefened embodiments use chimeric death genes'(l.e, 
chimeric death receptor genes). Chimeric death receptors may comprise the extracellular domain of a 
ligand-actlvated multimerizing receptor and the endogenous cytoplasmic domain of a death receptor 
gene, such as Fas or TNF. This avoids endogenous activation of the death gene. Thus, in one 
embodiment, substituting the extracellular portion of a death receptor, such as Fas, with the 
extracellular portion of another ligand activated multimerizing receptor provides a basis for using a 
completely different signal to activate cell death. Suitable ligand-actlvated dimerizing receptors 
Include, but are not limited to. the CDS receptor, erythropoeitin receptor, thrombopoietin receptor; 
growth homnone receptor, Fas receptor, platelet derived growth hormone receptor, epidennal growth • 
factor receptor, leptin receptor, and various inteiieukin receptors (e.g., IL-1. IL-2, lL-3, IL-4, IL-5, IL-6, 
IL-7, lL-8, IL-9, IL-1 1 , IL-12, IL-13. IL-15, and IL-17). When particular receptors are employed to 
modulate promoter activity, these receptors (e.g., IL-4 when examining IL-4 Induced promoter activity) 
are not preferred for use as a chimeric death gene receptor. 

In a prefenned embodiment, the chimeric cell death receptor genes are chimeric Fas receptors. The 
exact combination will depend on the cell type used and the receptors normally produced by these 
cells. For illustration, when the cells are human cells, a non-human extracellular domain and a human 
cytosolic domain are prefenred to prevent endogenous induction of the death gene. Thus, when 
human cells are used, a preferred chimeric receptor gene may comprise a murine extracellular Fas 
receptor domain and a human cytosolic domain, such that the endogerK)us human Fas ligand will not 
activate the murine receptor domain. Alternatively, human extracellular domains may be used when 
the cells do not endogenously produce the cognate ligand. For example, human EPO extracellular 
domain may be used when cells do not endogenously produce EPO (KawaguchI, Y. et al. (1997) 
Cancer LetL 116: 53-59; Takebayashl, H. et aL (1996) Cancer Res. 56: 4164; Rudert, F. et al. (1994) 
Biochem Blophys Res Commun. 204: 1 102-10; Takahashi. T. et al. (1996) J. Biol. Chem. 271 : 17555- 
60). In another aspect, the extracellular domains are combinations of different extracellular domains 
that fonm functional receptors (Mares, et al. (1992) Growth Factors. 6: 93-101; Seedorf, K. et al. 
(1991) J Biol Chem. 266: 12424-31 ; Heldaran. MA et al. (1990) J. Biol. Chem. 265: 18741-44; 
Okuda. K. et al. (1997) J, Clin. Invest. 100: 1708-15; Anders. RA et al. (1996) J. Biol. Chem. 271: 
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21758-66; Krishnan, K. etal. (1996) Oncogene, 13: 126-33; Ohashi. et al. (1994) Proc. Natl. Acad. 
Scl. USA. 91: 158-62;; and Amara, J.F. et al. (1997) Proc. Natl. Acad. Scl. USA 94: 10618-23. In 
general, the chtmeric death gene receptors have a tiansmembrane domain. As will be appreciated by 
those skilled In the art, the transmembrane domain from any of the receptors can be used, although It 
is preferable to use the transmembrane domain associated with the chosen cytosollc domain to 
preserve the Interaction of the transmembrane domain with other endogenous signaling proteins 
(Dederoq. W. et al. (1995) Cytokine 7: 701-09). 

Alternatively, the death genes are "one step" death genes, which need not require a ligand and death 
results from high expression of the gene. These death genes kill a cell without requiring a ligand or 
secondary signal. In one aspect, cell death is Induced by the overexpresslon of a number of 
programmed cell death (PCD) proteins known to cause cell death, including, but not limited to, . 
caspases, bax, TRADD, FADD, SCK, MEK, etc. 

In another aspect, one step death genes also include toxins that cause cell death, or Impair cell 
survival or cell function when expressed by a cell. These toxins generally do not require addition of a 
ligand to produce toxicity. An example of a suitable toxin is Campylobacter toxin CDT (Lara-Tejero, M. 
(2000) Science, 290: 354-57). Expression of CdtB subunit, which has homology to nucleases, causes 
cell cycle arrest and ultimately cell death. Another toxin, the diptheria toxin (and similar Pseudomonas 
exotoxin), functions by ADP ribosylattng the ef-2 (ebngation factor 2) molecule in the cell and 
preventing translation. Expression of the diptheria toxin A subunit Induces cell death in cells 
expressing the toxin fragment. Other useful toxins include cholera toxin and pertussis toxin (catalytic 
subunit-A ADP ribosylates G proteins that regulate adenylate cyclase), pierisin from cabbage butterflys 
(Induces apoptosis in mammalian cells; Watanabe, M. (1999) Proc. Natl. Acad. Sci. USA 96: 10608- 
13), phospholipase snake venom toxins (Diaz, C. etal. (2001) Arch. Biochem. Biophys. 391: 56-64), 
ribosome inactivating toxins (i.e. ricin A chain, Gluck, A. et al. (1992) J. Mol. Biol. 226: 41 1-24;and 
nigrin, Munoz, R. et al, (2001) Cancer Lett, 167: 163-69) and pore fomriing toxins (henrx)lysin and 
leukoddin). When the target cells are neuronal cells, neuronal specific toxins may be used to inhibit 
specific neuronal functions. These include bacterial toxins such as botullnum toxin and tetanus toxin, 
which are proteases that act on synaptic vesicle associated proteins (I.e., synaptobrevin) to prevent 
neurotransmitter release (see Binz, T. et al. (1994) J. Bbl. Chem. 269: 9153-58; Lacy, D.B. et al. 
(1 998) Cun-. Opin. Struct Biol. 8: 778-84). 

Another prefen-ed embodiment of a gene of Interest is a cell cycle gene, that Is. a gene that causes 
alterations In the cell cycle. For example, Cdk Interacting protein p21 (see Harper, J.W. et al. (1993) 
Cell 75: 805-16), which inhibits cyclln dependent kinases, does not cause cell death but causes cell- 
cycle anrest. Thus, expressing p21 allows selecting for regulators of promoter activity or regulators of 
p21 activity based on detecting cells that grow out much more quickly due to low p21 activity, either 
through inhibiting promoter activity or inactivatbn of p21 protein activity. As will be appreciated by 
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those \n the art, it is also possible to configure the system to select ceils based on their inabiilty to 
grow out due to increased p21 activity. Similar mitotic Inhibitors Inciude p27, p57, p16, p15, p18 and 
p19, p19 ARF (or its human homotog p14 ARF). Other cell cycie proteins useful for altering cell cycle 
Include cycllns (CIn), cyclin dependent Idnases (Cdk), ceil cycle checkpoint proteins (i.e., Rad17, p53), 
Cks1 p9. Cdc phosphatases (i.e., Cdc 25), etc. 

in yet another preferred embodiment, the gene of interest encodes a cellular bk>sensor. In these 
fusion nucleic acids, at least one of the genes of interest may encode a rGFP or pGFP fusion 
polypeptide, which Is Itself a cellular biosensor, or the cellular biosensor may be expressed in addition 
to the rGFP or pGFP (or rGFP or pGFP fusion protein). By a 'cellular biosensor* herein is meant a 
gene product that when expressed within a cell can provide Information about a particular cellular 
state. Biosensor proteins allow rapid determination of changing cellular conditions, for example Ca*^ 
levels in the cell, pH within cellular organelles, and membrane potentials (see Miesenbock, G. et al. 
(1998) Nature 394: 192-95; US Pat, No, 6,150,176). An example of an intracellular biosensor Is 
Aequorin, which emits light upon binding to Ca*^ Ions. The Intensity of light emitted depends on the 
Ca^^ concentration, thus allowing measurement of transient cateium concentrations within the ceil. 
When directed to particular cellular organelles by fusion partners, as more fully described below, the 
light emitted by Aequorin provides Information about Ca^^ concentrations within the particular 
organelle. Ottier Intracellular biosensors are chimeric GFP molecules engineered for fluorescence 
resonance energy transfer (FRET) upon binding of an analyte, such as Ca^^ (US Pat. No. 6,197,928; 
Miyawaki, A. et al. (1997) Nature 388: 882^7; Miyakawa, A. et al. (1997) Mol. Cell. Bfol. 8: 2659-76). 
For example, cameleon comprises a blue or cyan mutant of GFP, calmodulin, CaM binding domain of 
myosin light chain kinase, and a green or yellow GFP. Upon binding of Ca^^ by the Cal^ domain, 
FRET occurs between the two GFPs because of a stmctural change In the chimera. Thus, FRET 
intensity is dependent on ttie Ca^^ levels witihin the cell or organelle (Kenr. R. et al. Neuron (2000) 26: 
583-94). Other examples of intracellular biosensors include sensors for detecting changes In cell 
membrane potential (Siegel, M. et al. (1997) Neuron 19: 735-41; Sakal, R. (2001) Eur, J. NeuroscK 13: 
2314-18), monitoring exocytosis (Miesenbrock, G. et al. (1997) Proc. Nati. Acad. Sd. USA 94: 3402- 
07), and measuring Intracellular/organellar ATP concentrations via luciferase protein (Kennedy, H.J. et 
al. (1999) J. Biol. Chem. 274: 13281-91). These biosensors find use In monitoring tfie effects of 
various cellular effectors, for example phamnacological agents ttiat modulate ion channel activity, 
neurotransmitter release, ion fluxes witiiin the cell, and changes in ATP metabolism. 

Ottier Intracellular biosensors comprise detectable gene products witii sequences that are responsive 
to changes In intracellular signals. These sequences Include peptide sequences acting as substrates 
for protein kinases, peptides with binding regions for second messengers, and protein Interaction 
sequences sensitive to Intracellular signaling events (see for example, U.S. Pat. No. 5,958,713 and 
U.S, Pat. No. 5,925.558). For example, a fusion protein conslnict comprising a GFP and a protein 
kinase recognition site allows detecting intracellular protein kinase activity by measuring changes In 
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GFP fluorescence arising from phosphorylation of the fusion construct Alternatively, the GFP Is fused 
to a protein Interaction domain whose interadion with cellular components are altered by cellular 
signaling events. For example, it Is well known that inositol-triphosphate (lnsP3) Induces release of 
Ca*^ from intracellular stores Into the cytoplasm, which results In activation of a kinases responsible 
for regulating various cellular responses. The precursor to lnsP3 Is phosphatidyl-inositol-4,5- 
blsphosphate (RdlnsPj), which Is localized In the plasma membrane and cleaved by phospholipase C 
(PLC) following activation of an appropriate receptor. Many signaling enzymes are sequestered in the 
plasma membrane through pleckstrin homology domains that bind specifically to PtdlnsPj. Following 
cleavage of RdlnsPj. the signaling proteins translocate from the plasma membrane Into the cytosol 
where tbey activate various cellular pathways. Thus, a reporter nK>lecule such asf GFP fused to a 
plecl^trin domain will act as a intracellular sensor for phospholipase C activation (see Haugh, J.M. et 
ai. (2000) J. Cell. Biol. 15: 1269-80; Jacobs, A.R. et al. (2001) J. Biol. Chem. 276: 40795-802; and 
Wang, D.S. et al. (1996) Biochem, Btophys. Res. Commun. 225: 420-26). Other similar constructs 
are useful for monitoring activation of other signaling cascades and are applicable as assays in 
screens for candidate agents that inhibit or activate particular signaling pathways. 

Since protein interaction domains, such as the described pleckstrin homology domain, are important 
mediators of cellular responses and biochemical processes, other preferred genes of Interest are 
proteins containing protein-interaction domains. By "protein-Interaction domain" herein is meant a 
polypeptide region that interacts with other btomdecules, including other proteins, nudete adds, lipids, 
etc. Thiese protein domains frequently act to provide regions that induce formation of specific 
multiprotein complexes for recruiting and confining proteins to appropriate cellular locations or affect 
specifidty of interaction with targets ligands, such as protein kinases and their substrates. Thus, many 
of these protein domains are found in signaling proteins. Protein-interaction domains comprise 
modules or micro-domains ranging about 20-150 amino acids that can be expressed In isolation and 
bind to their physiological partners. Many different interaction domains are known, most of which fall 
into classes related by sequence or ligand binding properties. Accordingly, the genes of interest * 
comprising interaction domains may comprise proteins that are members of these classes of protein 
domains and their relevant binding partners. These domains Indude, among others, SH2 domains 
(src homology domain 2), SH3 domain (src homology domain 3), PTE domain (phosphotyroslne 
binding domain), FHA domain (foricedhead associated domain), WW domain, 14-3-3 domain, 
pleckstrin homology domain, C1 domain, C2 domain. FYVE domain (Fab-1. YGL023, Vps27. and 
EEA1), death domain, death effector domain, caspase recaiitment domain. Bcl-2 honrK)logy domain, 
bromo domain, chromatin organization modifier domain, F box domain, hect domain, ring domain 
(Zn*^ finger binding domain), PDZ domain (PS095, discs large, and zona occludens domain), sterile 
a motif domain, ankyrin domain, ami domain (armadillo repeat motif), WD 40 domain and EF-hand 
(calretinin), PUB domain (Suzuki T. et al. (2001) Biochem, Biophys. Res. Commun.. 287: 1083-87), 
nucleotide binding domain, Y Box binding domain, H.G. domain, all of which are well known In the art. 
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Since protein interactions domains are pervasive in ceiiular signal transduction cascades and other 
cellular processes, such as cell cycle regulation and protein degradation, expression of single proteins 
or multiple proteins with interaction domains acting In specific signaling or regulatory pathway may 
provide a basis for Inactivating, activating, or modulating such pathways in nomrial and diseased ceils. 
In another aspect, the preferred embodiments comprise binding partners of these interactions 
domains, which are well known to those sicilled in the art or are Identifiable by well icnown methods (i.e. 
yeast two hybrid technique, co-precipltation of Immune complexes, etc.). 

Included within the protein-interaction domains are transcriptional activation domains capable of 
activating transcription when fused to an appropriate DNA binding domain. Transcriptional activation 
domains are well Icnown In the art These include activator domains from GAL4 (amino acids 1-147; 
Relds. S. at at. (1989) Nature 340: 245-46; Gill, G. et al. (1990) Proc. Nati. Acad. Sd. USA 87: 2127- 
31), GCN4 (Hope, I.A. et al. (1986) Cell 46: 885-94), ARD1 (Thukral, S.K. et al. (1989) Mol. Cell. Biol. 
9: 2360-69). human estrogen receptor (Kumar, V. et al. (1987) Cell 51: 941-51). VP16 (Triezenbeng, 
S.J. et al. (1988) Genes l^v. 2: 718-29). Spl (Courey. A.J. (1988) Cell 55: 887-98), AP-2 (Williams, 
T. et al. (1991) Genes Dev. 5: 670-82), and NF-kB p65 subunit and related Rel proteins (Moore, PA 
et al. (1993) l^oi. Ceil. Biol. 13: 1666-74). DNA binding domains include, among others, leucine zipper 
domain, homeo box domain, Zn^^ finger domain, paired domain, LM domain, ETS domain, and T Box 
domain. 

Since tile g|enes of interest may comprise DNA binding domains and transcriptional activation 
domains, other genes of interest useful for expression in the present Inventfon are transcription 
factors. Preferred transcription factors are those producing a cellular phenotype when expressed 
within a particular cell type. Transcription factors as defined herein include both transcriptional 
activator or inhibitors. As not all cells will respond to expresston of a particular transcription factor, 
those skilled in tiie art can choose appropriate cell strains in which expression of a transcription factor 
results In dominant or altered phenotypes as described below. 

In another aspect, tiie transcription factor regulates expression of a different promoter of interest on 
an expression vector that does not encode the transcription factor. This arrangement requires 
Introducing into a single cell a plurality or multiple vectors, as described below, one of which expresses 
the transcription factor regulating ttie different promoter of interest. Expression of the transcription 
factor is made inducible or the transcription factor itself is an inducible transcription factor, tiius . 
allowing further regulation of the different promoter of interest. 

In an alternative embodiment, the transcription factor encoded by the gene of interest regulates the 
promoter on the expression vector encoding the transcription factor. Thus, tiiese constructs are 
autoregulatory for expression of tiie fusion nucleic acid (iHofrnann, A. (1996) Proc. Nati. Acad. Sd. 
USA 93: 5185-90). Accordingly, if tiie transcription factor inhibits ttie promoter activity on ttie 
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expression vector, continued synthesis of transcription factor restricts expression of the fusion nucleic 
add. On the other hand, if the transcription factor activates transcription, synthesis is elevated 
because of continued synthesis of the transcriptional activator. Consequently, by use of separation 
sequences to express a plurality of genes of Interest, one of which encodes the transcription factor, 
the retroviral vector autoregulates expression of the genes of interest. To enhance autoregulation, the 
transcription factor Is an inducible transcription factor, for example a tetracycline or steroid inducible 
transcriptbn factor (e.g., RU-486 or ecdysone Inducible, see White JH (1997) Adv. Pharmacol. 40: 
339-67). Incorporation of an Inducible transcription factor In a retroviral vector as a single 
autoreguiatory cassette eliminates the need for additional vectors for regulating the promoter activity. 
Moreover, this system results in rapid, uniform expression of the gene(s) of interest 

In another prefenBd embodiment, the gene of Interest encodes a protein whose expression has a 
dominant effect on the cell (I.e., produces an altered cellular phenotype). By "dominant effecf herein 
Is meant that the protein or peptide produces an effect upon the cell in which it Is expressed, or on 
another cell not expressing the dominant effect protein, and is detected by the methods described 
below. The dominant effect may act directly on the cell to produce the phenotype or act indirectly on a 
second molecule, which leads to a specific phenotype. Dominant effect is produced by introducing 
Into cells small molecule effectors, expressing a single protein, or by expressing multiple proteins 
acting in combination (e.g., proteins acting synergisticaity on a cellular pathway or a multlsubunit 
protein effector). As is well Icnown in the art, expression of a variety of genes of interest may produce 
a dominant effect. Expressed proteins may be mutant proteins that are constitutive for a biological 
activity (Segouffin-Cariou, C. et ai. (2000) J. Biol. Chem. 275: 3568-76; Luo et al. (1997) Mol. Cell. 
Biol. 17: 1562-71) or are inactive fomis that sequester or inhibit activity of nomial binding partners 
(Bossu, P. (2000) Oncogene. 19: 2147-54; Moch'izuld, H. (2001) Proc. Natl Acad. Sd. USA 98: 10918- 
23). The Inactive fomns as defined herein indude expression of small modular protein-interaction 
regions or other domains that bind to binding partners In the cell (see for example, Gilchrist, A. et al. 
(1999) J. Biol. Chem. 274: 6610-16). Dominant effects are also produced by overexpresslon of 
nomnal cellular proteins, expression of proteins not normally expressed In a particular cell type, or 
expression of nonnaily functioning proteins in cells lacking functional proteins due to mutations or 
deletions (Takihara, Y, et al. (2000) Cardnogenesis 21: 2073-77; Kaplan. J.B. (1994) Oncol. Res. 6: 
61 1-15). Random peptides or biased random peptides Introduced into ceils can also produce 
dominant effects. An exemplary effect of a dominant effect by a peptide is random peptides which 
bind to Src SH3 domain resulting in increased Src activity. This activation is due to ttie peptides' 
antagonistic effect on negative regulation of Src (see Sparics, A.B. et al. (1994) J Biol Chem. 269:' 
23853-56). 

As defined herein, dominant effect is not restricted to the effect on tiie cell expressing ttie protein. A 
dominant effect may be on a ceil contacting ttie expressing cell or by secretion of the protein encoded 
by the gene of interest Into tiie cellular medium. Proteins witti dominant effect on other cells are 
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conveniently directed to the plasma membrane or secretion by Incorporating appropriate secretion 
and/or membrane localization signals. These membrane bound or secreted dominant effector 
proteins may comprise cytokines and chemokines, growth factors, toxins (e.g., neurotoxins), 
extracellular proteases (e.g., metalloproteases), celt surface receptor ligands (e.g., sevenless type 
receptor ligands), adhesion proteins (e.g., L1 , cadherins, integrins, laminin), etc. 

in an altemative embodiment, the gene of interest encodes a conditional gene product By 
"conditional gene" product herein Is meant a gene product whose activity Is only apparent under 
certain conditions, for example at particular ranges of temperature. Other factors that conditionally 
affect activity of a protein Include, but are not limited to, ion concentration, pH, and light (see i-lager, A. 
(19S6) Planta 198: 294-99; Pavelka J. (2001) Bloelectromagnetlcs 22: 371-83). A conditional gene 
product produces a specific cellular phenotype under a restrictive condition. In contrast, the 
conditional gene product does not produce a specific phenotype under permissive conditions. 
Methods for making or isolating conditional gene products are well known (see for example White, 
D.W. et al. (1993) J. Virol. 67:6876-81; Parinl. M.C. (1999) Chem. Biol. 6: 679-87). 

As Is appreciated by those skilled in the art, conditional gene products are useful In examining genes 
that are detrimental to a cell's survival or in examining cellular biochemical and regulatory pathways in 
which the gene product functbns. For those gene products that affect cell survival, use of conditional 
gene products allow survival of the cells under pemilssive conditions, but results In lethality or 
detriment at the restrictive condition. This feature allows screens at the restrictive condition for 
candidate agents, such as proteins and small molecules that may directly or indirectly suppress the 
effect of a conditional gene product but permit maintenance and growth of cells under pennissive 
conditions. In addition, condittonal gene products are also useful In screens for regulators of cell 
physiology when the conditjonal gene product is a participant in a cellular regulatory pathway. At the 
restrictive condition, the conditional gene product ceases to function or becomes activated, resulting in 
an altered cell phenotype due to dysregulatton of the regulatory pathway. Candidate agents are then 
screened for their ability to activate or inhibit downstream pathways to bypass the dismpted regulatory 
point Conditional gene products are well known In the art and Include, among others, proteins such 
as dynamin Involved in endocytic pathway (Damke, H. et al. (1995) Methods Enzymol. 257: 209-20). 
p53 involved in tumor suppression (Pochampally, R. et al. (2000) Blochem. Biophys. Res. Comm. 279: 
1001-10 and Buckbinder, L et al. (1994) Proc. Natl. Acad. Sci. USA 91: 10640-44), Vaci involved in 
vestele sorting, proteins involved In viral pathogenesis (SV40 l-arge T Antigen; Robinson C.C. (1980). 
J Virol. 35: 246-48), and gene products Invoh^ed in regulating the cell cycle, such as ubiquitin 
conjugating enzyme CDC 34 (Ellison. K.S. et al. (1991) J. Btol. Chem. 266: 241 16-20). 

In another preferred embodiment, the gene of interest comprises a multiple cloning site (MCS). This 
allows cassetting In of various genes of interest into the expression vectors. In one preferred 
embodiment, the MCS lacks nucleotide sequences capable of functioning as a translation initiation 
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sKe, which albws cloning a gene of interest containing its own translation initiation sequences. 
Alternatively, the MCS comprises a peptide or protein coding region with its own translation initiation 
sequence for expressing proteins or peptides lacldng a translation initiation sequence. In addition, 
other nucleic acid sequences that increase expression of the first gene of interest (e.g., Gly or GlyGly 
following the initiating methionine residue) may be Included in the multiple cloning site. The coding 
region may also comprise an indicator gene, such as lacZ, to penmit Identification of inserts by 
Insertional inactivation of lacZ. In these constructs, use of a promoter controlling element capable of 
being active in both eukaryotes and prokaryotes will allow detecting lacZ in prokaryotes during the 
cloning process (see Wirtz, E. et al. (1995) Science 268: 1 179-83). In either case, a separatton 
sequence chosen from a protease based, IRES based, of Type 2A based sequence. Is operably linked 
totlie multiple cloning site. When at least one of the genes of interest comprises rGFP or pGFP, 
expression of the fluorescent proteins allows monitoring expression of a gene of interest cloned into 
the iVICS. 

In yet another prefenred embodiment, the gene of Interest comprises candidate bloactlve agents 
comprising candidate nucleic acids, as described below. Thus, a gene of Interest may comprise 
candidate bloactive agents In the fomi of cDNAs, cDNA fragments, genomic DNA fragments, and 
nucleic acids encoding random or biased random peptides, as described below. Expression of fuston 
nucleic acids where the gene of interest is a candidate agent a\]ows selectkm of cells expressing the 
candidate agent based on expression of the rGFP or pGFP. 

In the present invention, there is no particular order of the first gene of interest and the second gene of 
interest. When at least one of the genes of interest Is rGFP or pGFP. a prefenred embodiment may 
have a gene of interest upstream of the GFP. Another prefened embodiment may have the GFP 
upstream and the gene of interest downstream. By "upstream" and "downstream" herein is meant the 
proximity to the point of transcription initiation, which Is generally localized 5' to the coding sequence of 
the fusion nucleic acid. Thus, in a prefen-ed embodiment, the upstream position is more proximal to 
the transcriptfon initiation site than the downstream position. 

As will be appreciated by those skilled In the art, the positioning of the gene of Interest relative to the 
GFP is detennined by the person skilled In the art. Factors to consider Include the need for detecting 
expression of a gene of Interest or optimizing the synthesis of a protein of interest. In the 
embodiments described above, the GFP gene may be placed downstream of the gene of interest so 
that expression of the GFP will be a faithful indication of expression of the gene of interest. This will 
depend on the types of separatfon sites chosen by the person skilled In the art. When protease 
cleavage or Type 2A separation sequences are incorporated into the fusion nucleic acid, a GFP or 
other reporter gene situated downstream of the gene of interest will generally provide direct 
infomiatlon on expression of the gene of interest. In the case of IRES sequences, however, detecting 
expression of the GFP or reporter gene to monitor expression of an upstream gene of Interest Is less 
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direct since separate translation Initiations occur for the first genes of interest and the second gene of 
interest, generally resulting in lower amount of the second protein being made, in some cases, the 
ratio of expression of first and second proteins can be as high as 1 0:1 . 

The order of the gene of interest on the fiisbn nucleic acid and the choice of separation sequence Is 
also Important when the relative amounts gene of Interest are at issue. For example, use of IRES 
sequences may result in lower amounts of downstream gene product as compared to upstream QFP 
gene because of differing translation initiatori rates. Relative levels of translation initiation Is easily 
detemnined by comparing expression of upstream gene of Interest versus downstream gene of 
interests Where controlling expression levels are important, the person skilled in the art will order the 
gene product needed at higher levels upstream of the downstream gene product when IRES 
separation sequences are used. Altematively, multiple copies of IRES sequences are adaptable to 
increase expression of the downstream gene. On the other hand, use of protease or Type 2A 
separation sequences wHi lessen the need for ordering the gene of Interest on the fusion nucleic, acid 
since these separation sequences tend to produce equal levels of upstream and downstream gene 
product. 

As will be appreciated by those skilled in the art, various combinations of genes of interest may be 
used In the fusion nucleic adds of the present invention. In a prefenred embodiment, at least one of 
the genes of Interest comprises a rGFP or pGFP gene, or its variants. In one aspect, the rGFP or 
pGFP protein functions as a reporter protein for monitoring expression of the gene of interest For 
example, if the gene of interest is a nucleic acid encoding a dominant effect protein, a candidate agent 
comprising cDNA. or a candidate nucleic acid encoding a random peptide, expression of rGFP or 
pGFP provides a basis for selecting cells expressing the gene of interest and for nrK)nltoring their 
expression levels. In another aspect, expression of the rGFP or pGFP along with a gene of Interest 
comprising another reporter or selection gene allows for increased discrimination for selecting cells 
expressing the fusion nucleic acid. This Increased selectivity is desirable when measuring promoter 
activity, for example when screening for candidate agents affecting promoter activity. 

In another preferred, at least one of tiie genes of interest comprises a fusion nucleic acid encoding a 
rGFP or pGFP fusion protein. In one aspect, the rGFP or pGFP Is fused to a cDNA, genomic DNA, or 
nucleic acid encoding a random peptkJe. That is, the rGFP or pGFP fusion protein comprises 
candidate agents, as described below. In these constructs, a gene of Interest may comprise a 
distinguishable reporter gene to monitor expression of the rGFP or pGFP fusion protein. In another 
aspect, the gene of Interest may comprise a dominant effect protein, a cell cyde gene, or a conditional 
gene product that produces a specific cellular phenotype. This allows telentificatlon of candidate 
agents expressed by at least one of tiie gene of interest 0»e., the rGFP or pGFP fused to cDNA, 
genomic DNA or random peptides) tiiat alters tiie cellular phenotype produced by another gene of 
Interest. In another aspect, tiie gene of interest may comprise a cellular biosensor, which allows 
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analysis of cell physiological events Induced by expression of a separate rGFP or pGFP fusion protein. 

When the vectors are used to express separate protein products encoded by the genes of Interest, the 
fusion nucleic acids further comprise separation sequences. By a "separation sequence" or 
■separation site' or grammatical equivalents as used herein is meant a sequence that results in 
protein products not liniced by a peptide bond. Separation may occur at the RNA or protein level. By 
being separate does not preclude the possibility that the protein products of the first gene of interest 
and the second gene of Interest Interact either non-covalently or covalently following their synthesis. 
Thus, the separate protein products may interact through hydrophobic domains, protein-Interaction 
domains, common bound ligands, or through fbnmation of disulfide linkages between the proteins. 

Various types of separation sequences may be employed. In one preferred embodiment, the 
separation sequence encodes a recognition site for a protease. A protease recognizing the site 
cleaves the translated protein product into two or more proteins. Preferred protease deavage sites 
and cognate proteases Include, but are not limited to, prosequences of retroviral proteases including 
human immunodeficiency virus protease, and sequences recognized and cleaved by tiypsin (EP 
578472), Takasuga, A. etal. (1992) J. Btochem. 112: 652-57), proteases encoded by Picomaviruses 
(Ryan, I^.D. et aL (1997) J. Gen. Virol. 78: 699-723), factor X3 (Gardella, T.J. et al. (1990) J. Biol. 
Chem. 265: 15854-59; WO 9006370). collagenase (J03280893; WO 9006370; Tajima, S. et at. (1991) 
J. Femient. Bioeng. 72: 362), ctostrlpaln (EP 578472), subtilisin (including mutant H64A subtilisin, 
Forsberg. G. et al. (1991) J. Protein Chem. 10: 517-26), chymosin, yeast KEX2 protease 
(Bourbonnais. Y. et al. (1988) J. Bio. Chem. 263: 15342-47), thrombin (Forsberg et al., supra; Abath, 
F.G. et al. (1 991 ) BloTechniques 1 0: 1 78), Staphylococcus aureus V8 protease or similar 
endoproteinase-GkJ-C to cleave after Glu residues (EP 578472; Ishlzaki. J. et al. (1992) Appl. 
I^icrobiol. Biotechnol. 36: 483-86), cleavage by NIa proteainase of tobacco etch virus (Parks, T.D. et 
al. (1994) Anal. Blochem. 216: 413-17), endoprotelnase-Lys-C (U.S. Pat No. 4,414.332) and 
endoproteinase-Asp-N, Neisseria type 2 IgA protease (Pohlner, J. et al. (1992) Biotechnology 10: 
799-804). soluble yeast endoprotelnase yscF (EP 467839), chymotrypsin (Altman, J.D. et al. (1 991 ) 
Protein Eng. 4: 593-600), enteropeptidase (WO 9006370), lysostaphin, a polyglydne specific 
endoprotelnase (EP 316748), the family of caspases (i.e., caspase 1, caspase 2, capase 3. etc.), and 
metalloproteases. 

The present invention also contemplates protease reoognltton sites Identified from a genomic DNA, 
cDNA, or random nucleic add libraries (see for example, O'Boyle, D.R. et al. (1997) Virology 236: 338- 
47). For example, the fusion nucleic adds of the present invention may comprise a separation site 
which Is a randomizing region for the display of candidate protease recognition sites. The first and 
second gene of interest encode reporters molecules useful for detecting protease activity, such as 
rGFP or pGFP capable of undergoing FRET with other fluorescent proteins via linkage through a 
candidate recognition site (see Mitra, R.D. et al. (1996) Gene;173: 13-7). Proteases are expressed or 
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Introduced into cells expressing these fusion nucleic adds. Random peptide sequences acting as 
substrates for the particular protease result in separate GFP proteins when acted on by a protease, 
thus producing a loss of FRET signal. By Identifying classes of protease recognition sites, optimal or 
novel protease recognition sequences may be determined. 

In addition to their use in producing separate proteins of interest, the protease cleavage sites and the 
cognate proteases are also useful In screening for candidate agents that enhance or Inhibit protease 
activity. Since many proteases are crucial to pathogenesis of organisms or cellular regulation, for 
example, the HIV or caspase proteases, the ability to express reporter or selection proteins linked by a 
protease cleavage site allows screens for therapeutic agents directed against a particular protease. 

Another prefen'ed embodiment of separation sequences are internal ribosome entry sites (IRES). By 
"intemal ribosome entry sites", "internal ribosome binding sites', or *IRES elements', or grammatical 
equivalents herein is meant sequences that allow CAP independent Initiation of translation (Kim, D.G. 
et al. (1992) 1^1. Cell. Biol. 12: 3636-43; McBratney, S. et al. (1993) Cun*. Opin. Cell Biol. 5: 961-65). 
IRES sequences appear to act by recruiting 40S rlbosomal subunit to the mRNA In the absence of 
translation Initiation factors required for nornial CAP dependent translation Initiation. IRES sequences 
are heterogenous in nucleotide sequence, RNA structure, and factor requirements for ribosome 
binding. They are frequently located on the untranslated leader regions of RNA viruses, such as the 
Picomavlmses. The viral sequences range from about 450-500 nucleotides in length, although IRES 
sequences may also be shorter or longer (Adam, MA et al. (1991) J. Virol. 65: 4985-90; Bomian, 
A.l\4. et al. (1997) Nucleic Acids Res. 25: 925-32; Helien, C.U. et al. (1995) Curr. Top. Microbiol. 
Immunol. 203: 31-63; Mountford, P.S. et al. (1995) Trends Genet 1 1: 179-84). Embodiments of viral 
IRES separation sites are the Type I IRES sequences present in entero- and rhinovlruses and Type II 
sequences of cardlovlruses and apthovlruses (i e. encephalomyocarditis virus; see EIroy-SteIn, O. et 
al. (1989) Proc. Natl. Acad. Sci. USA 86: 6126-30; Alexander, L. et al. (1994) Proc. Natl. Acad. Sci. 
USA 91: 1406-10). Other viral IRES sequences are found in hepatitis A viruses ( Brown, E.A. et al. 
(1994) J, Virol. 68: 1066-74), avian reticuloendotheleiiosis virus (Lopez-Lastra, M. et al. (1997) Hum. 
Gene Ther. 8: 1855-65), Moloney murine leukemia virus (Vagner, S. et al. (1995) J. Biol. Chem. 270: 
20376-83), short IRES segments of hepatitis C viais (Urabe, M. et al. (1997) Gene 200: 157-62), and 
DNA viruses (i.e. Karposi*s sarcoma-associated virus, Bleleski, L. et al. (2001) J. Virol. 75: 1864-69). 

Additionally, preferred embodiments of IRES sequences are non-viral IRES elements found In a 
variety of organisms Including yeast, insects, birds and mammals. Like the viral IRES sequences, 
cellular IRES sequences are heterogeneous in sequence and secondary structure. Cellular IRES 
sequences, however, may comprise shorter nucleic add sequences as compared to viral IRES 
elements (Oh, S.K. et al. (1992) Genes Dev. 6: 1643-53; Chappeli, S A et al. (2000) 97: 1536-41 ). 
Specific IRES sequences Include, but are not limited to, those Involved In expression of 
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immunoglobulin heavy chain binding protein, transcription factors, protein kinases, protein 
phosphatases, elF4G (see Johannes, G. et al. (1999) Proc. Natl. Acad. Set. USA 96: 131 1&-23; . 
Johannes, G. et al. (1998) RNA 4: 1500-13), vascular endothelial growth factor (Huez, I. et al. (1989) 
Mol. Cell. Biol. 18: 6178-90), c-myc (Stoneley, M. et al. (2000) Nucleic Acids Res. 28: 687-94). 
apoptotic protein Apaf-1 (Coldweil, M.J. et al. (2000) Oncogene 19: 899-905), DAP-5 (Henis-Korenblit, 
S. et al. (2000) Mol. Cell Bio. 20: 496-506), connexin (Werner, R. (2000) lUBMB Life 50: 173-76). 
Notch-2 (Lauring, S.A. et al. (2000) Mol. Cell. 6: 939-45), and fibroblast growth factor (Creancier, L. et 
al. (2000) J. Cell. Biol. 150: 275-81). As some IRES sequences act or function efficiently In particular 
cell types, the person skilled in the art will choose IRES elements with relevance to particular cells 
being used to express the fusion nucleic acid. Moreover, multiple IRES sequences In various 
comblnatkxis, either homomultlmerfc or heteromultimeric arrangements constructed as tandem 
repeats or connected via linkers, are useful for increasing efffdency of translation (nftfation of the 
genes of interest. The combinations of IRES elements comprise at least 2 to 1 0 or more copies or 
combinations of IRES sequences, depending on the efficiency of initiation desired. 

In addition to their use as separation sequences, IRES elements serve as targets for therapeutic, 
agents since IRES sequences mediate expression of proteins involved In viral patiiogenesis or cellular 
disease states. Thus, tiie present Invention Is applicable In screens for candidate agents that inhibit 
IRES mediated translation initiation events. In tiiese constructs, the rGFP or pGFP may serve as a 
reporter of IRES mediated translation or may comprise tiie candidate agent being screened (e.g, when 
expressed as a fusion pnotefn witii cDNAs or random peptides). 

Another prefenred embodiment of IRES elements are sequences In nucleic acid or random nucleic 
add libraries that function as IRES elements. Screens for these IRES type sequences can employ 
fusion nucleic adds containing bicistronlcally anranged genes of interest encoding reporter genes or 
selection genes, or combinations tiiereof. Genomic. cDNA. or random nucleic acid sequences are 
Inserted between ttie two reporter or selection genes. After introdudng tiie nucleic acid construct into 
cells, for example by retroviral delivery, ttie cells are screened for expression of tiie downstream gene 
mediated by functional IRES sequences. Selection is based on expression of selection gene or 
reporter gene (e.g., FACS analysis for expression of a downstream rGFP or pGFP gene). The . 
upstream gene of interest serves to pemiit monitoring expression of the fusion nucleic acid. The 
length of tiie nucleic acids screened Is preferably 6 to 100 nucleotides, alttiough longer nucleic acids 
may be used. 

The present Invention furttier contemplates use of enhancers of IRES mediated translation initiation. 
IRES initiated translation may be enhanced by any number of metiiods. Cellular expression of virally 
encoded proteases, which cleaves elF4F to remove CAP-bindIng activity from tiie 40S ribosome 
complexes, may be employed to Increase preference for IRES translation Initiation events. These 
proteases are found in some Pioomavlnjses and can be expressed in a cell by introducing tiie viral 
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protease gene by transfectlon or retroviral delivery (Roberts, L.O. (1998) RNA 4: 520-29). Other 
enhancers adaptable for use with IRES elennents Include cls-acting elements, such as 3* untranslated 
region of hepatitis C virus (Ito. T. et al. (1998) J. Virol. 72: 8789-96) and polyA segments (Bergaminl. 
G. et al. (2000) RNA 6: 1 781 -90), which may be Included as part of the fusion nucleic add of the 
present Invention. In addition, preferential use of cellular IRES sequences may occur when CAP 
dependent mechanisms are Impaired, for example by dephosphorylation of 4E-BP, proteolytic 
cleavage of elF4G, or when cells are placed under stress by y-inradlation, amino acid starvation, or 
hypoxia. Thus. In addition to the methods described above, IRES enhancing prt)cedures include 
activation or Introduction of 4E-BP targeted phosphatases or proteases of elF4G. Altematlvely, the 
cells are subjected to stress conditions described above. Other trans-acting IRES enhancers Include 
heterogeneous nuclear ribonudeoprotein (hnRNP) (KaminskI, A. et al. (1998) RNA 4: 626-38), PTB 
hnRNP E2/PCBP2 (Walter, B.L. et al. (1999) RNA 5: 1570-85), La autoantlgen (Meerovitch, K. et al. 
(1993) J. Virol. 67: 3798-07), unr (Hunt. S.L et al. (1999) Genes Dev. 13: 437-48), ITAF45/IVIpp1 
(Plllpenko. E.V. et al. (2000) Genes Dev. 14: 2028-45). DAP5/NAT1/p97 (Henls-Korenbllt, S. et al. 
(2000) Mol. Cell. Biol. 20: 496-506). and nucleolin (Izumi, R.E. et al. (2001) Vims Res. 76: 17-29). 
These factors may be introduced into a cell either alone or in combination. Accordingly, various 
combinations of IRES elements and enhandng fadors are used to effect a separation reaction. 

In another preferred embodiment, the separation sites are Type 2A separation sequences. By "Type 
2A" sequences herein Is meant nudeic acid sequences tiiat when translated inhibit fomfiation of 
peptide linkages. Type 2A sequences are distinguished from IRES sequences in that 2A sequences 
do not Involve CAP Independent translation ihitiatran. Witfiout being bound by theory. Type 2A 
sequences appear to act by dlsmpting peptide bond formation between ttie nascent polypeptide chain 
and the incoming activated tRNA''™^ (Donnelly. M.L. et al. (2001) J. Gen. Virol 82: 1013-25). Alttiough 
the peptide bond fails to form, ttie ribosome continues to translate the remainder of tiie RNA to 
produce separate pepti'des unlinked at tiie carboxy tennlnus of the 2A peptide region. An advantage 
of Type 2A separation sequences Is that near stdchfometrte amounts of first protein of interest and 
second protein of Interest are made as compared to IRES elements. Moreover, Type 2A sequences 
do not appear to require additional factors, such as proteases that are required to effect separation 
when using protease recognition sites. 

Preferred Type 2A separation sequences are tfiose found in candiovlral and apttiovlral genomes. 
These sequences are approximately 21 amino adds long and have the general sequence 
XXXXXXXXXXUOCXDXEXNPGP, where X Is any amino add. Disruption of peptide bond formation 
occurs between the underiined carboxy temiinal glydne (G) and proline (P), These 2A sequences are 
found In the aptiiovlrus Foot and Moutti Disease Vims (FMDV), cardlovims Theller's murine 
encephalomyelitis vims (TME), and encephalomyocarditis vims (EMC). Various viral Type 2A 
sequences are shown in Figure 9. The 2A sequences fundion in a wide range of eukaryotic 
expression systems, thus allowing tiieir use in a variety of cells and organisms. Accordingly, Inserting 
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these 2A separation sequences in between the nucleic acids encoding the first gene of interest and 
second gene of interest, as more fully explained below, will lead to expression of separate protein 
products of the first gene of interest and the second gene of interest * 

In another embodiment, the present Invention contemplates mutated versions or variants of Type 2A 
sequences. By "mutated" or "Varianf or grammatical equivalents herein Is meant deletions, insertions, 
transitions, transverslons of nucleic acid sequences tiiat exhibit tiie same qualitative separating activity 
as displayed by the naturally occumng analogue, although preferred mutants or variants have higher 
efficient separating activity and efficient translation of tiie downstream gene of interest Mutant 
variants include changes in nucleic add sequence tiiat do not change the corresponding 2A amino 
acid sequence, but incorporate frequently used codons (i.e., oodon optimized) to allow efficient 
translation of tiie 2A region (see Zolotuldn, S. et ai. (1996) J. Viroi. 70: 4646-54). In anotiier aspect, 
the mutant variants are changes in nucleic add sequence tiiat change the corresponding 2A amino 
acid sequence. Thus, one embodiment of a variant 2A sequences are short deletions of the 20 amino 
acid 2A sequence that retains separating activity. The deletion may comprise removal of about 3 to 6 
amino acids at the amino tennlnus of the 2A region. In another embodiment, Type 2A sequences are 
mutated by methods well known in ttie art. such as chemical mutagenensis, digonucleotide directed 
mutagenesis, and error prone replication. Mutants with altered separating activity are readily Identified 
by examining expression of the fusion nucleic acids of the present invention. Assaying for production 
of a separate downstream gene product, such as a reporter protein or a selection protein, allows for 
identifying sequences having separating activity. AnoUier method for identifying variants may use a 
FRET based assay using linlced GFP molecules, as described above. Insertion of variant 2A 
sequences in place of or adjacent to tiie gly-ser linicer region, or other suitable regions linking ttie 
GFPs, will allow detection of functbnal 2A separatton sequences by identifying constmcts tiiat 
produce separated GFP molecuies, as measured by loss of FRET signal. Sequences having no or 
reduced separating activity will retain higher levels of FRET signal due to physical linkage of tiie GFP 
molecules. This strategy will pemiit high tiiroughput analysis of variants and allows selecting of 
sequences having high efficiency Type 2A separating activity. 

In yet another embodiment. Type 2A separation sequences Indude homologs present In other nucleic 
adds, Including nucleic acids of otiier vimses, bacteria, yeast, and multicellular organisms such as 
womns, insects, birds, and mammals. Homology in tills context means sequence similarity or Identity. 
A variety of sequence based alignment methodologies, which are well known to tiiose skilled in ttie art, 
are useful in identifyirig honfX)logous sequences. These indude, but not limited to, the local homology 
algoritfim of Smith, F. and Watemrian, M.S. (1981) Adv. Appl. Matii. 2: 482-89. homology alignment 
algorittim of Reason. W. R. and Lipman, D. J. (1988) Proc. Nati. Acad. Sci. USA 85: 2444-48. Basic 
Local Alignment Search Tool (BLAST) described by Altschul, S.F. etal. (1990) J. Md. Biol. 215: 403- 
10, or the Best Fit program described by Devereau, J. et al. (1984) Nucleic Acids. Res. 12: 387-95, 
and tiie FastA and TFASTA alignment programs, preferably using default settings or by Inspection. 
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In one prefenBd embodiment, similarity or Identity for any nucleic acid or protein outlined herein is 
calculated by Fast alignment algorithms based upon the following parameters: mismatch penalty of 
1.0; gap size penalty of 0.33, joining penalty of 30 (see "Cun^nt i^ethods In Comparison and Analysis' 
In Macromolecule Sequencing and Synthesis: Seleted Methods and Applications, p. 127-149, Alan R. 
Liss. Inc., 1998). Another example of a useful algorithm is PILEUP. PILEUP creates multiple 
sequence alignment from a group of related sequences using progressive. painA/lse alignments. It can 
also plot a tree showing the clustering relationships used to create the alignment PILEUP uses a 
simplification of the progressive alignment method of Feng, D.F. and Dooiittle, R.F. (1987) J. Moi. 
EvoL 25, 351-60, which is similar to the method described by Higgins, D.G. and Sharp, P.M. (1989) 
CABIOS 5: 151-3. Useful parameters include a defeult gap weight of 3.00, a defeuit gap length weight 
of of 10, and weighted end gaps. 

Another example of a useful algorithm is the family of BLAST alignment tools initial described by 
Aitschul et al. (see also Kariin, S. et al. (1993) Proc. Natl. Acad. Sci. USA 90: 5873-87). A particulariy 
useful BLAST program Is WU-BLAST-2 program described In Aitschul, S.F. et al. (1996) Methods 
Enzymoi. 266: 460-80. WU-BLAST uses several, search parameters, most of which are set to default 
values. The adjustable parameters are set with the following values: overiap span=1 , overiap fraction 
= 0.125, word threshold (T) = 1 1 . The HSP S and HSP S2 parameters are dynamic values and are 
established by the program itself depending upon the composition of the particular sequence and 
composition of the particular database against which the sequence of interest Is being searched; 
however, the values may be adjusted to Increase sensitivity. A % amino acid sequence identity value 
is determined by the number of matching identical residues divided by the total number of residues of 
the longer sequence In the aligned region. The "longer" sequence is one having the most actual 
residues in the aligned region (gaps introduced by VVU-BLAST-2 to maximize the alignment score are 
ignored). 

In a similar manner, "percent (%) nucleic acid sequence identity" with respect to the coding sequence 
of the polypeptide described herein is defined as the percentage of the nucleotide residues in a 
candidate sequence that are Identical with the nucleotide residues in the coding sequence of the Type 
2A regions. A prefen-ed method utilizes the BLASTN module of WU-BLAST-2 set to the default 
parameters, with overlap span and overiap fraction set to 1 and 0.125, respectively. 

An additional useful algorithm is gapped BLAST as reported by Aitschul, S.F. et al. (1997) Nucleic 
Acids Res. 25: 3389-402. Gapped BLAST uses BLOSSOM-62 substitution scores; threshold 
parameter set to 9; the two-hit method to trigger ungapped extensions; charges gap lengths of k at 
cost of 10+k; Xu set to 16, and Xg set to 40 for database search stage and to 67 for the output stage 
of the algorithms. Gapped alignments are triggered by a score corresponding to -22 bits. 

The alignment may Include the introduction of gaps in the sequence to be aligned. In addition, for 
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sequence which contain either more or fewer amino adds that the Type 2A sequences in Figure 3, ft is 
understood that the percentage of the hbmotogy will be detemnined based on the number of 
homologous amino acids in relation to the total number of amino acids. Thus, Type 2A sequences 
may be shorter or longer than the amino acid sequence shown In Figure 3. 

Another embodiment of Type 2A separating sequences are those sequences present in libraries of 
nucleic adds, Including genomic DNA or cDNA that have Type 2A separating activity. By Type 2A 
separating activity herein is meant a nudeic add which encodes a amino acid sequence that exhibits 
similar separating activity as the naturaily'occumng Type 2A sequences. Segments of nucleic acids 
are inserted between the first gene of interest and second gene of interest in the fusion nucleic adds 
of tiie present Invention and examined for separating activity as described above. The prefenned 
lengths to be tested are nucleic acids encoding peptides 5 to 50 amino acids or larger, with a more 
preferred range of peptides 1 0-30 amino acids long. 

Embodiments of Type 2A sequence also encompass random nucleic acid libraries encoding peptides 
that have Type 2A separating activity. In these embodiments, the separation site represents a 
randomizing region where random or biased random nucleic acids encoding random or biased 
random peptides are inserted between the first gene of interest and second gene of Interest. The 
preferred lengths of the random nucleic acids are nudeic acids encoding peptides 5 to 50 amino - 
acids, with a more prefenBd range of peptides 10-30 amino adds. Random peptides having 
separating activity are Identified using the above described assays. Identification of functional 
separation sequences will pemiit additional searches for related sequences having Type 2A like 
separating activity, either through homology searches, mutagenesis screens, or by use of biased 
random peptide sequences. Sequences with separating activity can then be used to express separate 
proteins of interest according to the present invention. 

in a preferred embodiment, the genes of interest are linlced to a fusion partner to form a fusion 
polypeptide as described above. In a prefenred embodiment, combinations of fusion partners are 
used, with or without llnicers. 

As will be appreciated by those sl<illed In the art. the fusion nudeic adds of the present invention are 
not limited to a fusion nucleic acid comprising only a promoter, first gene of interest, separation 
sequence, and a second gene of interest Any number of separation sequences and genes of Interest 
may be used in the fusion nudeic add. Additional separation sequences may be chosen from 
protease based, IRES based, or Type2A based separating sequences and added to the fusion nucleic 
adds along with additional genes of interest Consequently, a preferred embodiment further 
comprises a plurality of separating sequences and genes of Interest. Thus, In one aspect, the fusion 
nucleic adds comprises a second separating sequence and a tiilrd gene of interest, and may furttier 
comprise a third separating sequence and a fourth gene of Interest As will be appredated by those 
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skilled In the art, by inserting additional separating sequences and additional genes of Interest, any 
number of proteins encoded by the genes of Interest may be separately expressed. Additl6nal 
separating sequences and genes of Interest may be desired in screening methods where the first and 
second gene of Interest encode reporter proteins whose activities are affected by a ttiird gene of 
Interest or where expression of more than two genes of Interest Is necessary to produce a cellular 
phenotype. 

The nucleic acids and the fusion nucleic acids described herein can be prepared using standard 
recombinant DNA techniques described in, for example, Sambrook, J. et al., Molecular Cloning: A 
Laboratory Manual, 2nd edition, Cold Spring Harbor Press, Cold Spring Harbor, New Yoric, 1989, and 
Ausubel, F. et al., Current Protocols in Molecular Biology, Greene Publishing Associates and John 
Wiley & Sons, New York, NY, 1994. Generally, the expression vectors also contain ttie required 
regulatory or control sequences (e.g., promoters and promoter controlling elements, translation 
initiation and termination sequences, potyadenylation sequences, splicing signals, etc.), cloning and 
subcloning sites, reporter/selection or marker genes for Identifying cells containing the fusion nucleic 
acid, and priming regions for sequendng, polymerase chain reaction, or library syntiiesis, and tiie like. 
As described above, these nucleic acid sequences are operably linked such tiiat ttie resulting fusion 
nudeic acids are placed in a Hinctional relationship with each other. That is, tiie components 
described are placed In a relationship permitting them to function Is their intended manner. 

When the fusion nucleic acids contain separation sequences, constructing the fusion nucleic add will 
depend in part on the separation sequence employed. The separation sequence is operably linked to 
Uie first gene of Interest and second gene of interest such that tiie fusion nucleic acid Is capable of 
producing separate protein products of interest Thus, in a preferred embodiment, the separation 
sequence Is placed in between the first gene of interest and the second gene of Interest As will be 
appreciated by tfiose skilled in tiie art, use of separation sequences based on protease recognitton or 
Type 2A sequences requires that the fusion nudeic add comprising tiie first gene of Interest, 
separation sequence, and second gene of Interest be in frame. By "In frame" herein is meant tiiat the 
fusion nucleic acid encodes a continuous single polypeptide comprising tiie protein encoded by tiie 
first gene of interest, protein encoded by the separation sequence, and protein encoded by the second 
gene of Interest. Standard recombinant DNA techniques may be used for placing the components of 
the fusion nucleic acid to encode a contiguous single polypeptide. Linkers may k>e added to the 
separation sequence to fadlHate the separation reactions or limit structural Interference of the 
separation sequence on tiie genes of interest. Preferred linkers are (Gly)n linkers, where n Is 1 or 
more, with n being two, three, four, five or six, although linkers to 7-10 or more amino ackis are 
possible. 

As is appreciated by tiiose skilled in tiie art. use of IRES sequences does not require tiie first gene of 
Interest, separation sequence, and second gene of interest to be in frame since IRES sequences 
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function as Internal translation Initiation sites. Accordingly, fusion nucleic adds using IRES elements 
have the genes of interest ananged In a cistronic structure. That Is, transcription of the fusion nucleic 
acid produces a cistronic mRNA that encodes both first gene of interest and second gene of interest 
with the IRES element controlling translation initiation of the downstream gene of Interest 
Alternatively, separate IRES sequences may control the upstream and downstream gene of interest. 

Nudelc adds for making libraries of the fusion nucleic acids comprising genomic DNA or cDNA as 
described herein are made by methods well i^nown In the art The libraries may also be directed to 
specific set of encoded protein sequences, such as protein Interaction domains. These may be 
synthesized using standard oligonucleotide synthesis methods, by using libraries of cloned nucleic 
adds, or use of multiplex PCR of nucleic acids encoding the desired polypeptide domains. 

When the nucleic acids comprise libraries of random nucleic adds sequences or random encoded 
peptides, these nudeic adds are preferably synthesized using linown oligonucleotide synthesis . 
techniques. These techniques indude synthetic methods well known In the art and Include, among 
others, phosphoramidtte, phosphoramldate, and phosphonate chemistries (see Eckstein, 
Ollgonudeotide and Analogues: A Practical Approach, IRL Press, Oxford University Press, 1991). 
Synthesis is controlled such that nudeic acids are totally random or biased random, as more fully 
described below. 

Cells and cellular libraries comprising the fusion nucleic acids of the present Inventton are generated 
by introducing the fusion nudeic acids into a plurality of cells. By a "plurality of cells" herein is meant 
at least two cells, with at least 10^ being preferred, at least about 10° being particulariy prefenred, and 
at least about 10^ and 10^ being especially preferred. This plurality of ceils may comprise a cellular 
library, wherein generally each cell within the library contains a member of the library, for example 
different random nucleic acids, cDNAs or cDNA fragments, genomic DNA, and combinations thereof. 
As will be appreciated by those skilled in the art some cells within the library may not contain a 
member of the library, and some may contain more than one. When methods other than retroviral 
Infection are used to Introduce the fuston nudeic adds into a plurality of cells, the distributton of 
candidate nudeic acids within the individual membens of the cellular library may vary widely, as it is 
generally difficult to control the number of nudeic adds which are introduced Into a ceils, such as 
eiectroporation or transfedion. 

The fusion nudeic adds are introduced Into cells for expressing the fusion polypeptides and for 
screening, as Is more fully described below. By "introduced into" or grammatical equivalents herein is 
meant that the nucleic acids enter the cells In a manner suitable for subsequent expression of the 
nucleic acid. The method of introduction is largely dictated by the targeted cell type. Exemplary 
methods include CaP04 precipitation, dextran sulfete transfection, liposome fusion, iipofectin®, 
eiectroporation, bidlstic partide bombardment, microinjection, viral Infection, etc. The person skilled 
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in the art can choose the appropriate method of Introduction based on the cells and the fonn of the 
nucleic acid being introduced. As many phanmaceutlcally important screens require human or model 
mammalian cell targets, retroviral vectors capable of transfecting such targets are preferred. 

In a prefened embodiment, the prefened vectors are retroviral vectors. Prefen^ retroviral vectors 
Include a vector based on the murine stem ceil vims (MSCV) (see Hawley, R.G. et ai. (1994) Gene 
Then 1: 136-38) and a modified MFG virus (Riviere, 1. et al. (1995) Genetics 92: 6733-37), and 
pBABE. Other suitable vector include, among others, LRCX retroviral vector set; pSIR retroviral ' 
vector, pLEGFP-NI retroviral vector, pLAPSN retroviral vector pLXIN retroviral vector; pLXSN 
retroviral vector; all of which are commercially available (e.g., Clontech). When target cells are non- 
proirferating (e.g., brain cells), useful virai vectors are derived from lentiviruses (Miyoshi, H. et ai. 
(1998) J. Virol. 72; 8150-57), adenoviruses (Zheng, C. et al. (2000) Nat. Biotechnol. 18: 176-80) or 
alphavlruses (Ehrengruber, M.U. (1999) Proc. Nati. Acad. Sd. USA 96: 7041-46). 

Preferably, the fusion nucleic acids and the library of fusion nucleic adds or candidate agents are* first 
cloned into a viral shuttle vector to produce a library of plasmids. A typical shuttle vector is pLNCX 
(Clontech). The resulting plasmid library can be amplified in £. co//., purified and introduced Into 
retroviral packaging cell lines. Suitable retroviral packaging cell lines Include, but are not limited to the 
Bing and BOSC23 cells lines (WO 94/19478; Soneoka, Y. et al. (1985) Nudeic Adds Res. 23: 628-33; 
Finer, M.H. et al. (1994) Blood 83: 43-50); Phoenix packaging lines such as PhiNX-ampho; 292T + 
gag poi and retrovirus envelope; PA 317; and other cell lines outlined In Markowitz, D. et al. (1998) 
Virology 167: 400-06 (see also Mari^owitz, D. et al. (1998) J. Virol. 63: 1120-24; LI, K,J. et al. (1996) 
Proc. Natl. Acad. Sd. USA 93: 1 1658-63; and Kinsella. T.M. et al. (1996) Hum. Gene Ther. 7: 1405- 



in a preferred embodiment, viruses are made by transient transfection of the cell lines referenced 
above. The resulting viruses can either be used directly or be used to infect another retroviral cell line 
for expansion of the library. 

In a preferred embodiment the library of virus particles is used to transfect packaging cell lines 
disclosed herein to produce a primary viral library. By "primary viral" library" herein is meant a library 
of virus particles comprising the fusion nucleic acids of the present Invention. The production of the 
primary library Is preferably done under conditions known In ttie art to reduce done bias. The resulting 
primary viral library can be titred and stored, used directly to Infect a target host cell line, or be used to 
infect another retroviral producer cell for "expansion" of the library. 

Concentratton of virus may be done as follows. Generally, retrovimses are titred by applying retrovirus 

containing supernatant onto indicator cells, such as NIH3T3 cells, and then measuring the percentage 
of cells expressing phenotypic consequences of infection. The concentration of virus Is determined by 
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multiplying the percentage of ceils Infected by the dilution factor involved, and talcing into account the 
number of target cells available to obtain relative titre. If the retrovlms contains a reporter gene, such 
as lacZ, then Infection, Integration and expression of the recombinant vims Is measured by histological 
staining for lacZ expression or by flow cytometry (I.e., FACS analysis). In general, retroviral titres 
generated from even the best of the producer cells do not exceed 10^ per ml unless concentrated, for 
example by centrifugatlon and ultrafiltration. However, flow-through transduction methods can provide 
up to a ten-fold higher infectivity by Infecting cells on a porous membrane and allowing retrovirus 
supernatant to flow past ttie cells. This provides the capability of generating retroviral titres higher 
than those achieved by concentration (see Chucic, a;s. (1996) IHum. Gene Thre. 7: 743-50). 

To obtain the secondary vIibI library, host cells are preferably Infected witii a multiplicity of Infection 
(MOl) of 10, By ''secondary viral llbrar/ herein Is meant a library of retroviral particles expressing the 
claimed fusion nucleic acids and candidate agents described herein. 

As will be appreciated by those in the art, tiie viral libraries described above are used to produce the 
cellular libraries of the present Invention. As will be appreciated by tiiose In tiie art, the types of cells 
used In the present invention can vary widely. Basically any mammalian cells may be used. Including 
preferred cell types from mouse, rat, primate, and human cells. As is more fully described below, cell 
types Implicated In a wide variety of disease conditions are particularly useful, so long as a suitable 
screen may be designed to allow ttie selection of cells that exhibit an altered phenotype as a 
consequence of treating the cells with candidate agents. As will be appreciated by those in ttie art, 
modifications of tiie system by pseudotyping allows all eukaryotic cells to be used, preferably In higher 
eukaryotes (Morgan, RA et al. (1993) J. Virol. 67: 4712-21; Yang, Y. et al. (1995) Hum. Gene Then 6: 
1203-13). 

The fusion nucleic acids are introduced Into a host cell and treated under the appropriate conditions to 
induce or cause expression of the fusion protein. As described above, various expression vectors 
may be made for intiioducing the fusion nucleic acids into a variety of organisms, including prokaryotic 
and eukaryotic. Appropriate host cells include bacteria, archebacteria, yeast, fungi, wonns, plants, 
insect cells, and animal cells, including fish and mammalian ceils. For example, bacterial host cells 
Include Bacillus subWIs, Escherichia coll. Streptococcus cremoris, Streptococcus ilvidans, 
Haemophilus influenza etc. Yeast cells include Saccharomyces cerevisiae, Candida albicans, 
Candida matiosaj Hansenula polymorpha, Kiuyveromyces fragills, Kluyveromyces lacHs, Pichia 
guillerimondl, Schlzosaccharomyces pombe, and Yanrowia lipdytlca. Appropriate Insect cells include 
Lepidotera cell lines, such as Spodoptera frvglperda (e.g. Sf9) or Tdchoplusia nl. However, those 
skilled In the art will recognize the applicability of other Insect cell system, such as the sllkwomri 
Bombyx mori, Drosophlla cells (Schneider 2, KC, BG2-C6, and Shi), A, albopictus, A. aegypti, 
Choristoneura fumiferana, Hellothls virescens; Hellothis zea, Orgyia, pseuMsugata, Lymantria dispar, 
Piutella xylostella, li^alacostoma disstria, Pieris rapae, Mamestra configurata, Hyladphora cecropia. 
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may be used. In another prefened embodiment, live insects are used to express the proteins of the 
present invention. Larvae are the preferred fbnm for expressing the desired product, including the 
larvae of Manduca sexta, Bombyx mari, Drosophlla, and the Wke which are susceptible to infection by 
recombinant insect viruses. 

In a preferred embodiment, the fusion nudelc adds are expressed In mammalian cells. Basically, any 
mammalian cells may be used, with mouse, rat, primate and human cells being particularly preferred, 
although as will be appreciated by those in the art When retroviral vectors are used, preferred are 
mammalian cells in which the library of retroviral vectors are made. 

In a'prefened embodiment, cell types implicated in a wide variety of disease conditions are particularly 
useful wherl screens, as described below, are designed for selecting cells that exhibit an altered ' 
phenotype as a consequence of expression of gene of Interest, for example a random peptide, within 
the cell. Accordingly, suitable cell types Indude, but are not limited to, tumor cells of all types 
(particularly melanoma, myeloid leukemia, cardnomas of the lung, breast, ovaries, colon, kidney, 
prostate, pancreas and testes), cardiomyocytes, endothelial cells, epithelial ceils, lymphocytes (T-cell 
and B cell) , mast cells, eosinophils, vascular Intimal cells, hepatocytes, leukocytes Induding 
mononuclear leukocytes, stem cells such as haemopoetic, neural, skin, lung, kidney, liver and 
myocyte stem cells (for use In screening for differentiation and de-differentiatlon Actors), osteoclasts, 
chondrocytes and other connective tissue cells, keratinocytes, melanocytes, liver cells, kidney cells, 
and adipocytes. Suitable cells also Include known research cells, including, but not limited to, Jurkat-E 
cells. N1H3T3 cells, CHO, Cos, etc. (see the ATCC cell line catalog, hereby expressly incorporated by 
reference). 

To provide those skilled in the art the tools to use the present Invention, the nucleic adds and cells of 
the present Invention are assembled Into kits. The components Included in the kits may comprise the 
fusion nuciek: acids (e.g., expression vectors or libraries), enzymatic reagents for making the fusion 
nucleic acki constructs, cells for packaging and amplification of vlruseis, and jeagenis for transfectlon 
and transduction Into target cells. Alternatively, the kits contain libraries of fusion nucleic ackis 
capable of being introduced into cells and/or contain cells already stably expressing the fusion nucleic 
adds (e.g., via integration of the retroviruses into the cellular chromosome). 

In the present invention, the fusion nucleic adds and cells comprising the fusion nudete acids of the 
present invention find use in screens for candidate agents producing an altered cellular phenotype. By 
"candidate agent" or "candidate small molecules" or "candidate expression products" herein Is meant 
an agent or expression product which may be tested fo the ability to alter the phenotype of a cell. 

Candidate bbactive agents encompass numerous chemical classes, though typically they are organic 
molecules, preferably small organic compounds having a nrK>lecular weight of more than 100 and less 
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than about 2,500 daltons. Candidate agents comprise functional groups necessary for structural 
interaction with proteins, particularly hydrogen bonding, and typically Include at least an amine, 
carbonly, hydroxyl, or cart)oxyl group, preferably at least two of them functional chemical groups. The 
candidate agents often comprise cyclical cartx)n or heterocyclic structures, and/or aromatic or 
polyaromatic structures substituted with one or more of the above functional groups. Candidate 
agents are also found among blomolecules Including peptides, saccharides, fatty acids, steroids, 
purines, pyrimidlnes, derivatives, structural analogs or combinations thereof. Particuiariy prefenred are 
proteins, candidate drugs, and other small molecules. 

Candidate agents are obtained from a wide variety of sources, including libraries of synthetic or natural 
compounds. For example, numerous means are available for random and directed synthesis of a 
wide variety of organic compounds and blomolecules, including expression of randomized 
oligonucleotides (see for example, Gallop, MA etal. (1994) J. Med. Chem. 37: 1233-51; Gordon, 
E.M. et al. (1 994) J. Med. Chem. 37: 1 385^01 ; Thompson, LA. et al. (1 996) Chem. Rev. 96: 555- 
600; Balkenhol, F. et al. (1996) Angew. Chem. Int. Ed. 35: 2288-337; and Gordon. E.M. et al. (1996) 
Acc. Chem, Res. 29: 444-54). Altematively, libraries of natural compounds in the form of bacterial, 
fungal, plant and animal extracts are available or readily produced. Additionally, natural or 
syntiietically produced libraries and compounds are readily modified tiirough conventional chemical, 
physical, and biochemical means. Known pharmacological agents may be subjected to directed or 
random chemical modifications such as acylation, alkylation, esterificatlon, and amidification to 
produce structural analogs. 

The candidate agent can be pesticides, insecticides or environmental toxins; a chemical (including 
solvents, polymers, organic molecules, etc); therapeutic molecules (Including tiierapeutic and abused 
drugs, antibiotics, etc.); blomolecules (including hormones, cytokines, proteins, lipids, carbohydrates, 
cellular membrane antigens and receptors (neural, homnonal, nutrient, and cell surface receptors) or 
ttieir ligands, etc); whole cells (including prokaryotic and eukaryotic (Including pathogenic cells), 
Including mammalian tumor cells); viruses (including retrovimses, herpes viruses, adenoviruses, 
lentiWruses, etc.); and spores (e.g., fungal, bacterial etc.). 

In a preferred embodiment of candidate agents aro proteins. By "protein" herein is meant at least two 
covalently attached amino adds, which includes proteins, polypeptides, oligopeptides and peptides. 
The protein may be made up of naturally occurring amino acids and peptide bonds, or synthetic 
peptidomlmetic stixictures. Thus, "amino add" or "peptide residue" as used herein means both 
naturally occurring and synthetic amino acids. For example, homo-phenylalanine, dtrulline, and 
norieudne are considered amino acids for the purposes of the Invention. ''Amino adds" also Includes 
Imino residues such as proline and hydroxyproline. The side chains may be either tiie (R) or (S) 
configuration. In the preferred embodiment, tiie amino acids are In tiie (S) or L configuration. If lion- 
naturally occuning side chains are used, non-amino acid substituents may be used for example to 
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prevent or retard in-vivo degradations. Proteins including non-naturally occurring amino acids may be 
syntliesized or in some cases, made by recombinant techniques (see van IHest, J.C. et al. (1998) 
FEBS Lett 428: 68-70 and Tang et al. (1999) Abstr. Pap. Am. Chem. S218: U138-U138 Part 2. both 
of which are expressly incorporated by reference herein). 

in a preferred embodiment, the candidate bioactive agents are naturally occuning proteins or 
fragments of naturally occurring proteins. Thus, for example, cellular extracts containing proteins, or 
random or directed digests of protelnaceous cellular extracts, may be used, in this way, libraries of 
procaryotic and eul^aryotic proteins may be made for screening In the systems described herein. 
Particularly preferred in this embodinrient are libraries of bacterial, fungal, viral, and mammalian 
proteins, with the latter being preferred, and human proteins being especially prefened. 

Candidate agents may encompass a variety of peptldic agents. These include, but are not limited to, 
(1) immunoglobulins, particuiariy IgEs, IgGs and IgiVts, and particularly therapeutically or 
diagnostlcaliy relevant antibodies, including but not limited to, for example, antibodies to human 
albumin, apollpoproteins (including apolipoprotein E), human chorionic gonadotropin, Cortisol, a- 
fetoprotein, thyroxin, thyroid stimulating hormone (TSH), antithrombin, antibodies to pharmaceuticals 
(Including antieptileptic drugs (phenytoin, primidone, caitariezepin, ethosuximide, valproic acid, and 
phenobaril>itol), cardioactive drugs (digoxin, lidocalne, procainamide, and disopyramide), 
bronchodfiators ( theophylline), antibiotics (chloramphenicol, sulfonamides), antidepressants, 
immunosuppresants, abused drugs (amphetamine, methamphetamlne, cannabinoids, cocaine and 
opiates) and antibodies to any number of viruses (including orthomyxovlmses, (e.g. influenza virus), 
paramyxoviruses (e.g respiratory syncytial virus, mumps virus, measles virus), adenovimses, 
rtiinoviruses, coronaviruses, reoviruses, togaviruses (e.g. nit>ella virus), parvoviruses, poxviruses 
(e.g. variola virus, vaccinia virus), enterovimses (e.g. poliovlms, coxsaclcievirus), hepatitis viruses 
(Including A, B and C), herpesviruses (e.g. Herpes simplex virus, varicella-zoster vims, 
cytomegalovirus, Epstein-Barr virus), rotaviruses, Norwaik viruses, hantavirus, arenavirus, rhabdovirus 
(e.g. rabies virus), retroviruses (Including HIV, HTLV-I and -11), papovaviruses (e.g. papillomavirus), 
polyomaviruses, and picomaviruses, and the like), and bacteria (Including a wMe variety of pathogenic 
and non-pathogenic prokaryotes of Interest including Bacillus; Vibrio, e.g. V, cholerae; Escherichia, 
e.g. Enterotoxigenic £. coll. Shigella, e.g. S. dysenteriae; Salnrranella, e.g. S. typhi; Mycobacterium 
e.g. M tuberculosis, M. leprae; Clostridium, e.g. C. botulinum, C. tetani, C. difficile, Cperfringer^s; 
Comyebacterium, e.g. C. diphtheriae; Streptococcus, 5. pyogenes, S. pneumoniae; Staphylococcus, 
e.g. S. aureus; Haemophilus, e.g. H. influenzae; Neisseria, e.g. N, meningitidis, N, gononrhoeae; 
Yersinia, e.g. G. lambltaY. pestis, Pseudomonas, e.g. P. aeruginosa, P. putida; Chlamydia, e.g. C. 
trachomatis; Bordetella. e.g. B. pertussis; Treponema, e.g. T. palladium; and the like); (2) enzymiss 
(and other proteins). Including but not limited to, enzymes used as Indicators of or treatment for heart 
disease, Including creatine kinase, lactate dehydrogenase, aspartate amino transferase, troponin T, 
myogtobin, fibrinogen, cholesterol, triglycerides, thrombin, tissue plasminogen activator (tPA); 
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pancreatic disease fndicatofs including amylase, iipase, chymotrypsin and trypsin; liver function 
enzymes and proteins including cholinesterase, bilirubin, and alkaline phosphiatase; aldolase, prostatic 
acid phosphatase, tenriinal deoxynucleotidyt transferase, and bacterial and viral enzymes such as IHIV 
protease; (3) hormones and cytoldnes (many of which serve as ilgands for cellular receptors) such as 
erythropoietin (EPO). thrombopoletin (TPO). the interleuldns (including iL-1 through iL-17). Insulin, 
insulin-illce growth factors (including IGF-1 and -2), epidemnal growth factor (EGF), transfbnning 
growth factors (Including TGF-a and TGF-P), human growth hormone, transferrin, epidermal growth 
factor (EGF). low density lipoprotein, high density lipoprotein, leptin, VEGF, PDGF, dliary neurotrophic 
factor, prolactin, adrenocorticotropic hormone (ACTH), calcitonin, human chorionic gonadotropin, * 
Gortisoi, estradiol, follicle stimulating hormone (FSH), thyroid-stimulating homione (TSIH), luteinizing 
Tibrmone (LH), progesterone, testosterone, ; and (4) other proteins (including a-fetoprotein, 
carcinoembryonic antigen CEA. 

In a prefened embodiment, the candidate bioactive agents are peptides of from about 5 to about 30 
amino acids, with finom about 5 to about 20 amino acids being preferred, and from about 7 to about 15 
being particularly preferred. These peptides may be digests of naturally occurring proteins, as 
described above, or random peptides or "biased" random peptides and peptide analogs either 
chemically synthesized or encode^ by candidate nucleic acids. By "randomized" or grammatical 
equivalents herein is meant that each nucleic acid and peptide consists of essentially random 
nucleotides and amino acids, respectively. Generally, since these random peptides (or nucleic acids, 
discussed below) are chemically synthesized, they may incorporate any amino acid or nucleotide at 
any position. The synthetic process can be designed to generate randomized proteins or nucleic acids 
to allow the fomiation of all or most of the possible combinations over the length of the sequence, thus 
forming a library of randomized candidate bioactive proteinaceous agents. 

In one preferred embodiment, the library is fully randomized, with no sequence preference or 
constants at any position, in anoti^er preferred embodiment, the library is biased. That Is, some 
positions within the sequence are eitiier held constant or are selected from a limited number of 
possibilities. For example, in a preferred embodiment, the nucleotides or amino acid residues are 
randomized within a defined class, for example hydrophobic amino acids, hydrophlllc residues, 
sterically biased (either small or large) residues, or are amino acid residues for crosslinldng (i.e. 
cysteines) or phosphorylation sites (i.e. serines, threonines, tyrosines, or histi'dines). 

In a prefenred embodiment, the bias is toward peptides or nucleic acids ttiat Interact with known 
classes of molecules. For example, it Is known ttiat much of Intracellular signaling is canled out by 
short regions of polypeptide Interacting with otiier polypeptide regions of other proteins, such as the 
interaction domains described above. Another example of interaction domain is a short regbn from 
tile HIV-1 envelope cytoplasmic domain tiiat has been previously shown to block tiie action of cellular 
calmodulin. Regions of the Fas cytoplasmic domain, vi^ich shows homology to tiie mastopam toxin 
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from Wasps, can be limited to a short peptide region with deatli inducing apoptotic or G protein 
Inducing functions. I^againin, a natural peptide derived from Xenopus, can have potent anti-tumor 
and anti-microbiat actlvrty. Short peptide fragments of a protein l^inase C isozyme (P-PKC) have been 
shown to block nuclear translocation of PKC in Xenopus oocytes following stimulation, in addition, 
short SIH-^ target proteins have been used as pseudosubstrates for specific binding to Si-i-3 proteins. 
This is of course a short list of available peptides with biological activity, as the literature is dense in 
this area. Thus, there is much precedent for the potential of small peptides to have activity on 
intracellular signaling cascades. In addition, agonists and antagonists of any number of molecules 
may be used as the basis of biased randomization of candidate bioactlve agents as well. 

Thus, a number of molecules or protein domains are suitable as starting points for generating biased 
candidate agents. A large number of small rndecde domains are known that confer comnrKMi 
function, structure or affinity. These include protein-protein interaction domains and nucleic acid 
interaction domains described above. As is appreciated by those in the art, while variations of these 
protein-protein or protein-nucleic acid domains may have weak amino add homology, the variants 
may have strong structural homology. 

In another prefenBd embodiment, the candidate agents are nucleic acids. By "nucleic acid" or 
"oligonucleotide" or grammatical equivalents herein is meant at least two nucleotides covalently linked 
together. A nucleic acid of the present invention will generally contain phosphodiester bonds, although 
in some cases, as outlined bek)w, nucleic acki analogs are included that may have altemate 
backbones, comprising, for example, phosphoramide (Beaucage. S.L. et al. (1993) Tetrahedron 49: 
1925-63 and references therein; Letsinger, R.L, et al. (1970) J. Org. Chem. 35: 3800^3; Sprinzl, M. et 
al. (1977) Eur. J. Blochem. 81: 579-89; Letsinger, R.L et al. (1986) Nucleic Acids Res. 14: 3487-99; 
Sawai et al (1984) Chem. Lett. 805; Letsinger, R.L. et al. (1988) J. Am. Chem. Soc. 110: 4470; and 
Pauwels et al. (1986) Chemica Scripta 26:141-49), phosphorothioate (Mag, M. et ai. (1991) Nucleic 
Acids Res. 19: 1437-41; and U.S. Pat No. 5,644,048), phosphorodithioate (Briu et al. (1989) J. Am. 
Chem. Soc. Ill: 2321), 0-methyIphophoroamidlte linkages (see Eckstein, Oligonucleotides and 
Analogues: A Practical Approach, Oxford University Press, 1991), and peptide nudelc add backbones 
and linkages (Eghplm, M. (1992) Am. Chem. Soc. 114: 1895-97; Meier et al. (1992) Chem. Int. Ed. 
Engl. 31:1008; Eghoim, M (1993) Nature 365: 566-68; Carlsson, C. et al. (1996) Nature 380: 207, all 
of which are incorporated by reference). Other analog nudeic acids indude those with positive 
backbones (Dempcy, R.O. et al. (1995) Proc. Natl. Acad. Sd. USA 92: 6097-101); non-Ionic 
backbones (U.S. Pat. Nos. 5,386,023. 5,637,684, 5,602,240, 5,216.141 and 4,469,8^; Kiedrowshi et 
ai. (1991) Angew. Chem. Intl. Ed. English 30: 423; Letsinger, R.L. et ai. (1988) J. Am. Chem. Soc. 
110: 4470; Letsinger, R.L. etal. (1994) Nucleoside & Nucleotide 13: 1597; Chapters 2 and 3, ASC 
Symposium Series 580, "Carbohydrate Modifications in Antlsense Research", Ed. Y.S. Sanghul and P. 
Dan Cook; Mesmaeker et al. (1994) Bloorganic & Medidnat Chem. Lett. 4: 395; Jeffs et al. (1994) J. 
Blomolecular NMR 34: 17; (1996) Tetrahedron Lett. 37: 743) and non-rlbose backbones, Induding 



72 



wo 02/090S3S 



PCTAJS02/14766 



those described in U.S. Pat. Nos. 5,235.033 and 5,034,506, and Chapters 6 and 7, ASC Symposium 
Series 580, "Carbohydrate Modifications in Antisense Research", Ed. Y.S. Sanghul and P. Dan Cook. 
Nucleic acids containing one or more carbocyclic sugars are also included within the definition of 
nudefc acids (see Jenkins et al. (1995) Chem. Soc. Rev. 169-76). Several nucleic acid analogs are 
described In Rawls, C & E News June 2, 1997 page 35. All of these references are he.reby expressly 
Incorporated by reference. These modifications of the ritxDse-phosphate backbone may be done to 
facilitate the addition of additional nfK>ieties, such as labels, or to increase the stability and half-life of 
such molecules In physiological environments. In addttkm, mixtures of different nuciek: acid analogs, 
and mixtures of naturally occurring nucleic adds and analogs may be made. The nudeic adds may 
be single stranded or double stranded, as specified, or contain portions of both double stranded or 
stngie sfranded sequence. The nucleic acid may be DNA, both genomic and cDNA, RNA or hybrid, 
where the nucleic acid contains any combination of deoxyribo- and ribonucleotides, and any 
combination of bases, including uradi, adenine, thymine, cytosine, guanine, xanthine hypoxanthine, 
isocytoslne. isoguanine, etc.. although generally occurring bases are preferred. 

In a prefen'ed embodiment, the candidate nudeic acids comprise cDNAs. including cDNA libraries, or 
fragments of cDNAs. The cDNAs can be derived from any number of different cells and include 
cDNAs generated firom eucaryotic and procaryotk: cells, viruses, cells Infected with vimses or other 
pathogens, genetically altered cells, cells with defective cellular processes etc. Prefened 
embodiments include cDNAs made from different individuals, such as different patients, particulariy 
human patients. The cDNAs may be complete libraries or partial libraries. Furthemiore, the 
candidate nucleic acids can be derived from a single cDNA source or multiple sources; that is, cDNA 
from multiple cell types or multiple individuals or multiple pathogens can be combined In a screen. In 
other aspects, the cDNA may encode specific domains, such as signaling domains, protein Interaction 
domains, membrane binding domains, targeting domains, etc. The cDNAs may utilize entire cDWK 
constructs or fractionated constructs, including random or targeted fractionation. Suitable fractionatton 
techniques Indude enzymatic (i.e., DNase I, restrictton nudeases, etc.). chemical, or mechanical 
fractionation (i.e. sonicated or sheared). Also useful for the present Invention are cDNA libraries 
enriched for a specific dass of proteins, such as type I membrane proteins (Tashiro, K. et al. (1993) 
Science 261: 600-03) and membrane proteins (Kopczynski, C.C. (1998) Proc. Natl. Acad. Sci. USA 
95: 9973<78). Additionally, subtracted cDNA libraries in which genes preferentially or exclusively 
expressed in particular celts, tissues, or developmental phases are enriched. Methods for making 
subtracted cDNA libraries are well known in the art (see Diatchenko, L. et al. (1999) Methods 
Enzymd. 303: 349-80; von Stein, O.D. et al. (1997) Nucleic Acids Res. 13: 2598-602: Carcind, P. 
(2000) Genome Res. 10: 1431-32). Accordingly, a cDNA library may be a complete cDNA library from 
a cell, a partial library, an enriched library from one or more cell types, or a constructed library with 
certain cDNAs being removed to firom a library. 

in another prefenied embodiment, the candidate nucleic acids comprise genomic nudeic acids. 
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including organellar nucleic adds. As elaborated above for cDNAs. the genomic nucleic acids may be 
derived from any number of different cells, Including genomic nucleic acids of eukaryotes. 
prokaryotes. or vJmses. TTiey may be from nomial cells or cells defective in cellular processes, such 
as tumor suppression, cell cyde control, or cell surface adhesion. Moreover, the genomic nucleic 
acids may be obtained from cells infected with pathogenic oi^ anisms. for example cells infected with 
viruses or bacteria. The genomic nucleic acids comprise entire genomic nudelc add constmds or 
fractionated constructs, including random or targeted fractionation as described above. Generelly for 
genomic nudeic adds and dDNAs. the candidate nudeic adds may range from nucleic adds lef>glhs 
capable of encoding proteins of twenty to thousands of amino add residues, with from about 50-1000 
being preferred and from about 100-500 being espedally preferred, in addition, candidate agents 
-comprising cDNA or genomic nucleic acids may also be subsequently mutated using laiown 
techniques-(e.g.. exposure to mutagens, error prone PCR. error prone trenscriptlon. combinatorial 
spicing (e.g.. cre-lox recombination) to generate novel nudeic add sequences (or pi^tein sequences) 
In this way libraries of procaryolic and eukaryotic nudeic adds may be made for screening in the 
systems described herein. Particulariy preferred in the embodiments are llbreries of baderial. fungal 
Viral and mammalian nucleic adds, with the latter being preferred, and human nudeic adds being 
especially preferred. 

in another preferred embodiment, the candidate nudeic adds comprise libraries of rendom nudeic 
acids. Generally, the random nudeic adds are fiilly randomized or they are biased In thefr 
randomization, e.g.. In nucleotide/residue frequency generally or per position. As defined above, by 
•randomized* or grammatical equivalents herein is. meant that each nucleic add consists essentially of 
random nudeotides. Since the candidate nudeic adds are diemically synthesized, they may 
incorporate any nucleotide at any position, in the expressed random nucleic add. at least 10. 
preferably at least 12. more preferably at least 15. most preferably at least 21 nudeotlde positions 
need to be randomized. The candidate nucleic adds may also comprise nucleic add analogs as 
descn'bed above. 

For candidate nudeic acids encoding peptides, the candidate nucleic adds generally contain cloning 
sites which are placed to allow in-frame expresston of the randomized peptides, and any ftjslon 
partners. If present, sudi as presentation sfrudures and the GFPs of the present invention. For 
example, when presentation stmdures are used, the presentation stmdure will generally contain the 
initiating ATG as part of the parent vedor. Forcandfdate agents comprising RNAs. in addition Id 
chemically synthesized RNA nudeic adds, the candidate nudeic acids may be expressed from 
vedore. induding retroviral vedors. Thus, when the RNAs are expressed, vedors expressing the 
candidate nudeic acids are generally construded with an intemal promoter (i.e.. CMV promoter), 
tRNA promoter, cell specific promoter, or hybrid promoters designed for immediate and appropriate 
expression of the RNA structure at the initiation site of RNA synthesis. For retroviral vedore. the RNA 
may be expressed anti-sense to the direction of retroviral synthesis and is terminated as known, fbr 
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example with an orientation specific terminator sequences. Interference from native virai promoter 
initiated transcription may be minimized in the target celi by using the SIN vectors described herein. 

When the nucleic acids are expressed in the celts, they may or may not encode a protein as described 
herein. Thus, included within candidate nucleic acids of the present invention are RNAs capable of 
producing an altered phenotype. Thus, In one aspect, the nucleic acid may be an antisense nucleic 
acid directed towards a complementary target nucleic acid. As Is well known In the art, antisense 
nucleic acids find use In suppressing or affecting expression of various genes of pathogenic 
organisms or expression of cellular genes. These include suppression of oncogenes to affied the 
proliferative pipperties of transfbnmed cells (Martiat. P. et al. (1993) Blood 81 : 502-09; Daniel, R. 
(1995) Oncogene 10: 1607-14* NIemeyer, C.C. (1998) Cell Death Differ. 5: 440-49). modulate cell 
cycle (Skotr, M, et al. (1995) Cancer Res. 55: 5493-98; ), Inhibit proteins Involved In cardiovascular 
disease states (Wang, H. (1999) Circ. Res. 85: 614-22) and inhibit viral pathogenesis (Lo, K.M. et al. 
(1992) Virobgy 190: 176-83; Chatteijee S. et al (1992) Science 258: 1485-88). 

In another preferred embodiment, the candidate nucleic acids are nucleic acids capable of catalyzing 
cleavage of target nucleic acids in a sequence specific manner, preferably in the form of ribozymes. 
Ribozymes include among others hammerhead ribozymes, hairpin ribozymes, and hepatitis delta virus 
ribozymes (Tuschl. T. (1995) Cun*. Opin. Staict Btol. 5: 296-302; Usman N. (1996) Cun* Opin Stmct 
Biol 6: 527-33; Chowrira B.M. et al. (1991) Biochemistry 30: 8518-22; Pen-otta A.T. etal. (1992) • 
Biochemistry 3: 16-21 ). As witii antisense nucleic acids, nucleic acids catalyzing deavage of target 
nucleic acids may be directed to a variety of expressed nucleic acids, Including those from pathogenic 
organisms or cellular genes (see for example, Jackson, W.H. et al. (1998) Blochem. Biophys. Res. 
Commun. 245: 81-84). 

Another preferred embodiment of candidate nuctek: ackJs are double stranded RNA capable of 
Inducing RNA interference or RNAI (Bosher, J.M. et al. (2000) Nat. Cell Biol. 2: E31-36). Introducing 
double stranded RNA can trigger specific degradation of homologous RNA sequences, generally 
within the region of Identity of tiie dsRNA (Zamore, P.D. et. al. (1997) Cell 101 : 25-33), This provides 
a basis for silencing expression of genes, thus pemnitting a method for altering the phenotype of cells. 
The dsRNA may comprise syntiietic RNA made either by known chemical synthetic methods or by in 
vitro transcription of nucleic acid templates canrying promoters (e.g., T7 or SP6 promoters). 
Alternatively, the dsRNAs are expressed in Wvo, preferably by use of palindromic flisbn nucleic adds, 
Uiat allow fadle fomnatton of dsRNA (e.g., In ttie fomi of a hairpin) when expressed in tiie cell. 

In a prefen^ embodiment, a library of candidate bioactive agents are used. These indude libraries of 
small molecules, nucleic acids, peptides, cDNAs, genomic nucleic acids, etc. In a preferred 
embodiment, for candidate agents comprising random nucleic acids and peptides, ttie library should 
provide a sufficientiy stmcturally diverse population of randomized expression products to effect a 
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probabilistically sufficient range to provide one or more peptide products whbh has the desired 
properties such as binding to protein interaction domains or producing a desired cellular response. 
Accordingly, a library must be large enough so that at least one of Its members will have a structure- 
that gives It affinity for some molecule, protein or other factor whose activity Is Involved In some 
cellular response, such as signal transduction. Although it is difficult to gauge the required absolute 
size of an interaction library, nature provides a hint with the immune response: a diversity of 1 0^-1 0" 
different antibodies provides at least one combination with sufficient affinity to interact with most 
potential antigens faced by an organism. Published in vitro selection techniques have also shown that 
a library size of about 10^ to 10® is sufficient to find structures with affinity for the target A library of all 
combinations of a peptide 7-20 amind acids in length, such as proposed here for expression in 
retrovlmses, has the potential to code for 20^ (10^ to 20^. Thus with libraries of 10^ to 10* per ml of 
retroviral particles, the present methods allow a "working" subset of a theoretically complete 
interaction library for 7 amino adds, a subset of shapes for the 20^ library. Thus, In a preferred 
embodiment, at least 10^ preferably at least 10^, more preferably at least 10° and most preferably at 
least 10^ diflierent expression products are simultaneously analyzed In the subject methods. Preferred 
methods maximize library size and diversity. 

The candidate bioactive agents are combined or added to a cell or population of cells or plurality of 
cells. By "population of cells" or "plurality of cells' herein is meant at least two ceils, with at least about 
10^ being preferred, at least about 10^ being particularly preferred, and at least about 10^. 10°, and 10^ 
being especially prefen-ed. 

The candidate agents and the cells are combined. As will be appreciated by those in the art. this may 
be accomplished in any number of ways, including adding the candidate agents to the surface of the 
cells, to the media containing the cells, or to a surface on which the cells grow or contact; adding the 
agents Into the cells, for example by using vector that will introduce agents into the cells, espedally 
when the agents are nucleic acids or proteins. 

In a prefened embodiment, the candidate agents are either nucleic adds or proteins that are 
introduced into the cells to screen for candidate agents capable of altering the phenotype of a cell. By 
"introduced into" or gramrr^ticai equivalents herein is meant that the nucleic acids enter the cells In a 
manner suitable for subsequent expression of the nudeic acid or protein. The method of introduction 
is largely dictated by the targeted cell type. Known methods Include CaP04 transfectlon, DEAE 
dextran transfectlon, liposome fusion, lipofedlniS). electroporation, viral infection, blolistic partlde 
bombardment etc. The candidate nucleic acids may exist either transiently or stably In the cytoplasm 
or stably integrate Into the genome of the host cell (i.e., by retroviral integration, homologous 
recombination). When mammalian cells are used, retroviral vectors capable of transfecting such 
targets are preferred. 
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In a preferred embodiment, the candidate bioactive agents are either nucleic acids or proteins 
(proteins in this context includes proteins, oligopeptides, and peptides) that are expressed in the host 
cells using vectors, including viral vectors. The choice of the vector will depend on the cell type. For 
example, when cells are replicating mammalian celts, retroviral vectors are used. When the cells are 
non-replicating mammalian cells, for example when an'ested in one of the growth phases, viral vectors 
capable of infecting non-dividing cells, Including lentlviral and adenoviral vectors, are used to express 
the nucleic acids and proteins. 

In a preferred embodiment, the candidate bioactive agents are either nucleic acids or proteins that are 
Introduced into the host celts using retroviral vectors, as Is generally outlined In PCT US 97/01019 and 
PCf US97/01048. both of which are expressly incorporated by reference. Generally, a library is 
generated using a retroviral vector backbone; standard oligonucleotide synthesis is done to generate 
either the candidate agent or nucleic acid encoding a protein, for example a random peptide, using 
techniques well known In the art After generating the nucleic acid library, the library is cloned Into a 
first primer, which serves as a cassette for insertton into the retroviral constmct. The first primer 
generally contains addittonal elements, including for example, the required regulatory sequences <e.g. 
translation, transcription, promoters, etc.) fusion partners, restriction endonudease sites, stop codons» 
regions of complementarity for second strand priming. 

A second primer is then added, which generally consists of some or all of the complementarity region 
to prime the first primer and optional sequences necessary to a second unique restriction site for 
purposes of subclonlng. Extension with DNA polymerase results in double stranded oligonucleotides, 
which are then cleaved with appropriate restriction endonucleases and subdoned Into the target ■ 
retroviral vectors. 

Any number of suitable retroviral vectors may be used, in one aspect, prefenred vectors include those 
based on murine stem cell vims (MSCV) (Hawley, et al. (1994) Gene Therapy 1: 136), a modified 
MFG virus (Relvere et al. (1995) Genetics 92: 6733), pBABE, and others described above. Well 
suited retroviral transfection systems are described in Mann et al, supra; Pear et al. (1993) Proc. Natl. 
Acad. Sci. USA 90: 8392-96; Kitamura, et al. Human Gene Ther. 7: 1405-1413; Hofmann, et al Pcoc. 
Natl Acad. Sci. USA 93: 5185-90; Choate et (1996) Human Gene Ther 7: 2247; WO 94/19478; PCT 
US97/01019, and references cited therein, all of which are Incorporated by reference. 

The vectors used to introduce candidate agents may include Inducible and constitutive promoters for 
the expression of the candidate agents, as described above. For example, there are situations 
wherein it is necessary to induce peptide expression only during ceriain phases of the selection 
process, such as during particular periods of the cell cycle. As described above, a large number pf 
constitutive and inducible promoters are well known, and may be used to regulate expression of the 
candidate agents. 
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In a preferred embodiment, the bloactive candidate agents comprising nucleic acids and proteins are 
linl<ed to a fusion partner, as described above. In one aspect, combinations of fusion partners are 
used. Any number of combinations of presentation structures, targeting sequences, rescue 
sequences, and stability sequences may be used witli or without linker sequences. Thus, candidate 
agents, which include these components, may be used to generate a library of fragments, each 
containing a different candidate nucleotide sequence (e.g.. random nucleic add. cDNA, genomic DNA 
etc.) that may encode a different peptide sequence. 

In a preferred embodiment, when the candidate agent Is introduced to the cells using expression 
vectors, the candidate peptide agent is linked to a detectable molecule, and the methods of the 
invention Include at least one expression assay. Thus, the detectable molecule may comprise . 
reporter and'selection genes as described herein. In one prefen-ed embodiment, the detectable 
molecule Is distinguishable from that expressed by the fusion nucleic acid expressing a gene of 
interest. An expression assay Is an assay that allows the detenninatk>n of whether a candidate 
bloactive agent has been expressed, i.e., whether a candidate peptide agent is present In the cell. 
Thus, by linking the expression of a candidate agent to the expression of a detectable molecule such 
as a label, the presence or absence of the candidate peptide agent may be detennlned. Accordingly, 
in this embodiment, the candidate agent Is operably linked to a detectable molecule. Generally, this is 
done by creating a fusion nucleic acid. The fusion nucleic add comprises a first nudeic add 
expressing the candidate bioactive agent (which can include fusion partners, as outlined above), and a 
second nucleic acid expressing a detectable molecule. In a prefenred embodiment, the fusion nucleic 
add encodes a fusion polypeptide comprising the candidate agent and the detectable molecule. In 
another preferred embodiment, the fusion nudeic add may use one promoter for the first nucleic and 
a second promoter for the second nudeic add to produce separate nucleic adds comprising a 
candidate nucleic acid, which may or may not encode a protein, and the detectable molecule. In yet 
another prefen'ed embodiment, the fusion nucleic add may use separation sequences described 
herein to express separate candidate bloactive agent and detectable molecule. The temis "firsf and 
"second" are not meant to confer an orientation of the sequences with respect to 5*-3* orientation of 
the fusion nucleic acid. For example, assuming a 5'-3* orientation of the fusion sequence, the first 
nucleic add may be located either 5' to the second nucleic acid, or 3' to the second nucleic acid. 
Preferred detectable molecules in this embodiment include, but are not limited to, various fluorescent 
proteins and their variants, Induding A Wcforfa GFP, Renilla muelleri GPP, Renilla renffdm^s GPP, 
Ptitosarcus gurneyl GFP, YFP, BFP, RFP, Anemonia majano fluorescent proteins, Zoanthus 
fluorescent proteins, Discosoma striata fluorescent proteins, and Clavularia fluorescent proteins. 

Thus, In one preferred embodirnent, the vectors used to introduce candidate agents comprises a 
promoter operably linked to fusion nucleic adds encoding fusion polypeptides comprising rGFP or 
pGFP, including fusions with random nucleic acids (I.e.. for expressing random peptides), cDNAs, and 
genomic DNA fragments. Fusions to rGFP or pGFP provide a way of monitoring expression of the 
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candidate agent, tracking and localization of the candidate agent, and sorting cells expressing the 
candidate agents. In another aspect, a preferred embodiment comprises a vector comprteing a 
promoter, a first gene of Interest, a separation sequence, and second gene of interest comprising 
rGFP or pGFP. The gene of interest expresses the candidate agent while the GFP reporter allows 
monitoring its expression. Expressing separate candidate agent and reporter reduces any 
interference with activity of the candidate agent by fusing to a reporter protein. If the candidate agent 
comprises a rGFP or pGFP fusion protein, the second gene of interest may conriprise a reporter 
distinguishable from rGPF or pGPF fusion protein. 

In^ general, the candidate agents are added to the cells, either extraceliulariy or intracellulariy, as 
outlined above, under reaction conditions that favor agent-target interactions. Generally, this will be 
physiological conditions, incubations may be perfomned at any temperature which facilitates optimal 
activity, typically between 4 and 40X. Incubation periods are selected for optimum activity, but may 
also be optimized to facilitate rapid high throughput screening. Typically between 0.1 and 24 hour will 
be suffidenL Excess reagent Is generally removed or washed away. 

A variety of other reagents may be included in the assays. These Include reagents like salts, neutral 
proteins, e.g.. albumin, detergents, synthetic polymers (polyethylene glycol, dextran sulfate), ionic 
agents etc. which may be used to fiactlitate optimal protein-protein binding and/or reduce non-specific 
or background Interactions. Also reagents that othenvlse improve the efficiency of the assay, such as 
protease inhibitors, nuclease Inhibitors, anti-microbial agents, etc., may be used. The mixture of 
components may be added in any order that provides for detection. Washing or rinsing the cells will 
be done as will be appreciated by those In the art at different times, and may Include tiie use of 
filtration and centrifugation. When second labeling moieties (also refenBd to herein as "secondary 
labels") are used, they are preferably added after excess non-bound target nK>lecules are rennoved. In 
order to reduce non-specifte binding. However, under some circumstances, all the components may 
be added simultaneously. 

As will be appreciated by those in the art. the type of cells used In the present Invention can vary 
wklely. Basically, ttie screen may use any cell in which the fusion nucleic acids of the present 
Invention can be Introduced and expressed. These Include bacterial, fungal, plant, insect, and 
mammalian cells. In a preferred embodiment, when ttie ceils are mammalian cells, particuiarty 
preferred cells are mouse, rat, primate and human cells. When the candidate agents are in the form 
of retroviral vectors, the screen may use any mammalian cells In which a library of retroviral vectors 
comprising tiie fusion nucleic acids of the present Invention are made. In addition, modifications of 
retroviral system by pseudotyping allows neariy all mammalian cell types to be used (see Morgan, 
R.A. et al. (1993) J. Virol. 67: 4712-21; Yang, Y. et al. (1995) Hum. Gene Ther. 6: 1203-13). 

As is more fully described below, a screen is set up such that Uie cells exhibit a selectable phenotype 
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In the presence of a candidate agent. For mamnialian cells, cell types implicated in a wide variety of 
disease conditions are particularly useful, so long as a suitat)le screen may be designed to allow ttie 
selection of cells that exhibit an altered phenotype as a consequence of the presence of a candidate 
bioactive agent within the celt. Accordingly, suitable cell types include, but are not limited to. tumor 
cells of all types (particularly melanoma, myeloid leukemia, carcinomas of the lung, breast, ovaries, 
colon, kidney, prostate, pancreas, and testes), cardiomyocytes. endothelial cells, epithelial cells, 
lymphocytes (T-cell and B cell), mast cells, eosinophils, vascular Intimal cells, hepatocytes, leukocytes 
including mononuclear leukocytes, stem cells such as hemopoietic, neural, skin, lung, kidney, liver and 
myocyte stem cells (for use In screening for differentiation and de-dlfferentiation factors), osteoclasts, 
chondrocytes and other connective tissue cells, keratlnocytes, melanocytes, liver cells, kidney cells, 
and^dipocytes. Suitable cells also include known research cells, including, but not limited to, Jurkat-E 
cells. NliH3T3 cells, CHO, Cos, etc. See the ATCC cell line catak)g, hereby expressly incorporated by 
reference. 

In one embodiment, the cells may be genetk^ally engineered, that Is, contain exogenous nucleic ackJs, 
for example to contain target nDolecules. 

In a preferred embodiment, a first plurality of cells is screened. That is, the cells Into which the 
candidate nucleic acids are Introduced are screened for an altered phenotype. Thus, in this 
embodiment, the effect of the bioactive candidate agent is seen In the same cells in which it Is made; 
I.e., an autocrine effect 

By a "plurality of cells" herein Is meant roughly from about 10^ cells to 10" or 10*, with from 10' to 10" 
being prefened. This plurality of cells comprises a cellular library, wherein generally each cell within 
the library contains a member of the retroviral molecular library, e.g. a different candidate nucleic add, 
although as will be appreciated by those in the art. some cells within the library may not contain a 
retrovlms, and some may contain more than one. When methods other than retroviral Infection are 
used to Introduce the candidate nucleic acids into a plurality of cells, the distribution of candidate 
nucleic adds within the Individual cell members of the cellular library may vary widely, as It Is generally 
difficult to control the number of nucleic acids which enter a cell during electroporation, transfection 
etc. 

In a preferred embodiment, the candidate nudeic acids are Introduced into a first plurality of cells, and 
the effect of the candidate bioactive agents Is screened in a second or third plurality of cells, different 
from the first plurality of cells. I.e., generally a different cell type. That Is, the effect of the bioactive 
agents is due to an extracellular effect on a second ceil; I.e., an endocrine or paracrine effect This is 
done using standard techniques. The first plurality of cells may be grown In or on on^ media, and the 
media Is allowed to touch a second plurality of cells, and the effect measured. Altematlvely. there may 
be direct contact between the cells. Thus, "contacting" as used herein is a functional contact, and 
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includes both direct and indirect. In this emlxxJiment, the first plurality of cells may or may not be 
screened. 

If necessary, the ceils are treated to conditions suitable for the expression of the candidate nucleic 
adds, for example, when Inducible promoters are used, to produce the candidate expression 
products, either translation or transcription. Expression of the candidate agents results In functional 
contact of the candidate agent and the cell. Thus, in a prefenred embodiment, the methods of the 
. present Invention comprise introducing candidate nucleic adds into a plurality of cells, a cellular library. 
The plurality of cells is then screened, as is more fully outlined below, for a cell exhibiting an altered 
phenotype. The altered phenotype is due to the presence of a candidate bloactlve agent 

By "altered phenotype" or "changed phystology" or other grammatical equivalents herein Is meant that 
the phenotype of the cell is altered In some way, preferably in some detectable and/or measurable 
way. As will be appreciated In the art, a strength of the present invention is the wide variety of cell 
types and potential phenotyplc changes which may be tested using the present metliods. Accordingly, 
any phenotyplc change which may be observed, detected, or measured may be the basis of the 
screening methods herein. Suitable phenotyplc changes include, but are not limited to: gross physical 
changes such as changes In ceil morphology, cell growth, cell viability, adhesion to substrates or other 
cells, and cellular density; changes In the expression of one or rrK>re RNAs, proteins, lipids, hormones, 
cytol^ines, or other molecules; changes in the equilibrium state (i.e. half-life) or one or more RNAs, 
proteins, lipids, homriones, cytol^lnes, or other molecules; changes In the localization of one or more 
RNAs, proteins, lipids, hormones, cytolcines, or other molecules; changes in the bloaclivity or specific 
activity of one or nrK>re RNAs, proteins, lipids, homiones, cytokines, receptors, or other molecules; 
changes In the secretion of ions, c^kines. homnones. growth factors, or other molecules; alterations 
in cellular membrane potentials, polarization, integrity or transport; changes in infectivity, susceptibility, 
latency, adhesion, and uptake of viruses and bacterial pathogens; etc. By "capable of altering the 
phenotype" herein Is meant that the candidate agent can change the phenotype of the cell In some 
detectafcrfe and/or measurable way. 

The altered phenotype may be detected In a wide variety of ways, as Is described more fully below, 
and will generally depend and correspond to the phenotype that is being changed. Generally, the 
changed phenotype is detected using, for example: microscopic analysis of cell morphology; standard 
cell viablRty assays, Induding both Increased cell death and increased cell viability, for example, cells 
that are now resistant to cell death via virus, bacteria, or bacterial or synthetic toxins; standard labeling 
assays such as fluorometric indicator assays for the presence or level of a particular celt or molecule, 
Including FACS or other dye staining techniques; biochemical detection of the expresston of target 
compounds after killing the cells; etc. In some cases, as is more fully described herein, the altered 
phenotype Is detected In the cell In which the randomized nucleic acid was introduced; In other 
embodiments, the altered phenotype Is detected in a second cell which is responding to some 
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molecular signal from the first cell. 

In a preferred embodiment, once a cell with an altered phenotype is detected, the ceil is isolated from 
the plurality which do not have altered phenotypes. Isolation of the altered cell may be done in any 
number of ways, as Is known in the art, and will In some instances depend on the assay or screen. 
Suitable isolation techniques Include, but are not limited to, FACS; lysis selection using complement; 
ceil cloning; scanning by Ruorimager, expression of a "survival" protein; Induced expression of a ceil 
surface protein or other molecule that can be rendered fluorescent or taggabte for physical isolation; 
expression of an enzyme that changes a non-fluorescent molecule to a fluorescent one; overgrowth 
against a background of no or slow growth; death of cells and Isolatbn of DMA or other cell vitality 
indicator dyes, etc. 

In a prefened embodiment, the candidate nucleic acid and/or bioactive agent is isolated from the 
positive cell, in one aspect, primers complementary to DNA regions common to the expressbn 
constructs, or to specific components of the library such as a rescue sequence, defined above, are 
used to "rescue" the unique random sequence. Altematlvely, the bioactive candidate agent is isolated 
using a rescue sequence. For example, rescue sequences comprising epitope tags or purification 
sequences may be used to pull out the bbactive candidate agent using Immunoprecipitatbn or affinity 
columns. In some instances, as Is outlined below, this may also pull out the primary target molecule if 
there is a sufficiently strong binding interaction between the bioactive agent and the target molecule. 
Altematlvely, the peptide may be detected using mass spectroscopy. 

Once rescued, the sequence of the candidate agent and/or bioactive nucleic acid is determined. This 
Informatton can then be used in a number of ways. 

in a preferred embodiment, the candidate agent is resyntheslzed and reintroduced into the target cells, 
to verily the effect. For mammalian ceils, this may be done using retrovimses, or alternatively using 
fusions to the MIV-1 Tat protein, and analogs and related proteins, which allows very high uptake into 
target cells (see for example, Fawell, S. et al. (1994) Proc. Natl. Acad. Sci. USA 91: 664-68; Frankei, 
A.D. etal.{1988) Cell 55: 1 189-93; Savion, N. et al. (1981) J. Biol. Chem. 256: 1149-54; Derossi, D. et 
al. (1994) J. Biol. Chem. 269:10444-50; and Baldin, V. et al. (1990) EMBO J. 9: 151 1-17, all of which 
are Incorporated by reference). 

In a prefen-ed embodiment, the sequence of a candidate agent is used to generate more candidate 
bioactive agents. For example, the sequence of the candidate agent may be the basis of a second 
round of (e.g., biased) randomization, to develop other candidate agents with Increased or altered 
activities. Altematlvely, the second round of randomization may change the affinity of the candidate 
agent. Furthermore, It may be desirable to put the identified random region of the candidate agent Into 
other presentation structures, or to alter tiie sequence of tiie constant region of the presentation 
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Structure, to alter the oonformation/shape of the candidate agent. It may also be desirable to "walk" 
around a potential binding site, in a manner simiiar to the mutagenesis of a binding pocl<et, by keeping 
one end of the ligand region constant and randomizing the other end to shift the binding of the peptide 
around. 

In a prefen^d embodiment, either the candidate agent or the candidate nucleic acid encoding it Is 
used to Identify target molecules. As will be appreciated by those In the art, there may be primary 
target molecules, to which the candidate agent binds or acts upon directly, and there may be 
secondary target molecules, which are part of the signaling pathway affected by the bbactive agent; 
these might be temned 'Validated targets". 

in a prefened embodiment, the bioactive agent is used to pull out target molecules. l=or example-, as 
outlined herein, If the target molecules are proteins, the use of epitope tags or purification sequences 
can allow the purification of primary target molecules via biochemical means {e.g., co- 
immunopredpitation. affinity columns, etc.). Altematively, the peptide, when expressed In bacteria and 
purified, can be used as a probe against a bacterial cDNA expression library made from mRNA of the 
target cell type. Altematively, peptides can be used as "baif in either yeast or mammalian two or three 
hybrid systems. Such interaction cloning approaches have been very useful In isolating DNA-bindIng 
proteins and other interacting protein exponents. The peptlde{s) can be combined with otiier . 
phamiacotogic activators to study the epistafic relationships of signal transduction pattiways in 
question. It is also possible to synthetically prepare labeled peptide candidate agent and use it to 
screen a cDNA library expressed in bacteriophage for ttiose expressed cDNAs which bind tiie peptide. 
Furttiennore, it Is also possible that one could use cDNA cloning via retroviral libraries to 
"complement" the effect induced by the peptide. In such a strategy, the peptWe would be required to 
be stochlometrically titrating away some important factor for a specific signaling pathway. If tills 
molecule or activity Is replenished by over-expression of a cDNA from within a cDNA library, ttieaone 
can done tiie target. Similarty, cDNAs doned by any of the above yeast or bacteriophage systems 
can be reintinoduced to mammalian cells in this manner to confinm that tiiey act to complement 
function in the system the peptide acts upon. 

Once primary target molecules have been identified, secondary target molecules may be Identified in 
the same manner, using the primary target as the "balf. In this manner, signaling pathways may be 
elucidated. Similariy. bioactive agents spedfic for secondary target molecules may also be discovered 
to klentify a number of bioactive agents acting on a single pathway, for example for when developing 
combination ttierapies. 

The methods of the present Invention may be useful for screening a large number of cell types under 
a wide variety of conditions. Generally, tiie host ceils are cells that are involved in disease states, and 
they are tested or screened under conditions that nonnaliy result In undesirable consequences on the 
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cells. When a suitable bbactive candidate agent is found, the undesirable effect may be reduced or 
eliminated. Alternatively, normally desirable consequences may be reduced or eliminated, with an eye 
towards elucidating the cellular mechanisms associated with the disease state or signaling pathway. 

In view of all the foregoing, the compositions and methods described herein are useful In a variety of 
applications. In one prefenred embodiment, the compositions of Ihe present Invention are useful as 
reporters for gene expression. In these applications, the compositions may be operably linked to the 
promoter elements to provide a measure of gene expression. When used with separation sequences 
as a downstream gene of interest, the rGFP or pGFP provides a basis for monitoring levels of 
expression of the upstream gene of Interest. 

In another preferred embodiment, the compositions of the present invention are useful for tracl<ing and 
localizing proteins. In these embodiments, proteins or peptides are fused to rGFP or pGPF, which 
serves as reporters for monitoring localization of proteins to subcellular compartments; assessing 
Intracellular trafficking of proteins; or examining protein-protein Interactions, proteinniucleic add 
Interactions, and protein interactions with other molecules. 

Since protein-Interaction domains serve as a basis for many cellular processes and cell signaling 
events, prefened embodiments of the present inventton further comprise substrates for enzymatJc 
reactions, such as proteases, kinases and phosphatase, and further serve as intracellular btosensors 
that provide information about the physiologfcal state of the cell. 

In other prefenred embodiments, the composittons of the present inventton are useful as candidate 
agents In the fonri of random nucleic acids, cDNAs, cDNA fragments or genomic DMA fragments 
fused to rGFP or pGFP gene. These GFP fusions provide a basis for monitoring expression and 
localization of the candidate agent and importantly serves as a scaffold for constraining the peptide 
for presentation in an biologically active fonn. In addition, the GFP moiety is useful as a rescue 
sequence and for pulling out cellular targets of the candidate agents. 

In these embodiment, tfie methods outlined herein are used to screen for modulators of cellular 
phenotypes. Cellular phenotypes that may be assayed include, but are not limited to, cell apoptosis, 
cell cycle, exocytosis, cytokine secretion, cell adhesion, signal transduction, protein interaction, etc. 
As will be appreciated by those in the art. any number of cellular assays that rely on rGFP or pGFP 
and their variants can be devetoped. 

In one preferred embodiment, tiie rGFP or pGFP can be used to evaluate, test and screen promoters. 
Thus, In this embodiment, the present invention provides compositions comprising a promoter of 
interest and a gene encoding a rGFP or pGI^P. Alternatively, the compositions comprise a promoter 
operably linked to a gene of interest, a separation sequence, and a gene encoding rGFP or pGFP. 
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Preferably, the promoter is not the native rGFP or pGFP promoter. 

In a pnefen^ed embodiment the fusion nucieic adds are used to screen for modulators of promoter 
activity. By 'Inodulatlon" of promoter activity herein is meant increase or decrease in transcription of 
the fusion nucleic add regulated by the promoter of Interest. Various promoters of different organisms 
are amenable to analysis, Including promoters of bacterial, yeast, vi^omrt. Insect, plant, and mammaRan 
cells. In mammalian cells, examples of relevant promoters are IL-4 inducible e promoter, IgH 
promoter, NF-kp regulated promoters, APC/p-catenin regulated promoters, myc regulated promoters, 
cell specific promoters (peripheral nervous system, central nervous system, kidney, skin, bone, lung, 
heart, liver, bladder, ovary, testes, colon etc.), cytokine regulated promoter, stress regulated 
promoters (e.g„ heat shock), clrcadlan rhythm regulated promoters, and promoters regulating HIV 
viral gene expression and cell cycle genes. Preferred are promoters that regulate expression of signal 
transduction proteins, cell cycle regulatory proteins, oncogenes, or promoters which are themselves 
regulated by signal transduction pathways, cell cycle regulators, or other aspects of cell regulatory 
networics. 

Candidate agents are contacted with the cells comprising the fusion nucleic acid and examined for 
effects on reporter gene expression (see for example, WO 99/58663, hereby expressly Incorporated 
by refierence). If the promoter Is Inducible, promoter is induced with appropriate stimulus or effector. 
AHematlvely, the promoter is induced prior to addition of the candidate bioactive agents, or . 
simultaneously. For example, for the IL-4 Inducible 8 promoter, addition of cytokine IL-4 or IL-13 to the 
cells (e.g., IL-4 of not less than 5 units/ml and at a prefen-ed concentration of 200 units/ml) can induce 
transcription of the e promoter. Screening of candidate agents affecting inducible expression of the 
reporter will allow Identification of cellular targets involved In signal transduction events mediated by 
the cytokine. 

To provide a more stringent selection for promoter regulators, the fusion nucleic may comprise a 
promoter, a rGFP or pGFP, a separation sequence, and a reporter/selection gene distinguishable from 
rGFP or pGFP. The GFP allows selection of cells expressing the fluorescent protein while the 
reporter/selection gene allows an additional basis for selecting cells. In one aspect, the 
reporter/selection gene may be a death gene that provides a nucleic acid that encodes a protein 
causing cell death. It is preferable that cell death require a two step process: expression of the death 
gene and induction of death phenotype by a signal or ligand. This two step process Is desirable when 
the promoter being analyzed is constitutlvely active. For example, if the selection gene is a thymidine 
kinase (TK), the cells can be selected based on killing by gangcyclovir since TK activity is needed for 
gangcyclovir toxicity. Altematively, the selection gene may encode the heparin binding eptdemial 
growth factor (HBEGF) protein and the killing initiated by adding diptheria toxin. Thus, candidate 
agents that repress promoter activity are readily identified by selecting for cells that are resistant to cell 
death arid lacking In GFP expression. The presence of a separation sequence, such as Type 2A, 
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allows expression of both reporter and selection genes from a single transcript, thus providing a 
sensitive indicator of pronr^oter activity. Verification of the presence of the death gene is prefenred to 
l<eep the levels of felse positives low; that Is, cells that survive the screen should be due to the 
presence of an inhibitor of the promoter rather than a laci< of the death gene. 

In another preferred embodiment, inducible promoter may be llnlced to "one step" death genes (e.g., 
diptheria toxin A fragment). In this embodiment, the inducible promoter is leaky such that some small 
amount of death gene and the reporter protein (e.g., rGFP or pGFP) is expressed. The low level of 
reporter gene expression allows selection of ceils containing the death gene to avoid false positives. 
To these cells, candidate agents are contacted and promoter induced to express the death gene. 
Selection of surviving cells enriches for those cells that contain agents Inhibiting the promoter. 

For examining promoters regulated by specific signal transduction pathways, cells capable of 
transducing the signal are used. For example, for IL-4 inducible e promoter system, any cells that 
express an tL-4 receptor that transduces the IL-4 signal to the nucleus and alters transcription can be 
used. Suitable cells include, but are not limited to, human cells and celt lines that show IL-4/13 
inducible production of germline e transcripts, including, but not limited to, DND39 (see Watanabe, 
supra) . MC-116, (Kumar, etal. (1990) Eur. Cytokine Netw. 1: 109), CA-46 (Wang, etal. (1996) J. Natl. 
Cancer. Inst. 88: 956). As is noted herein, the ability of MC-116 and CA-46 cells to produce gemiline 
e transcripts upon IL^/13 Induction was not known prior to the present inverrtion. Thus, preferred 
embodiments provide for MC-1 16 and/or CA-46 cells comprising recombinant nuciek: acid reporter 
constructs as outlined herein. 

In another preferred embodiment, the fusion construct comprises an endogenous promoter and an 
exogenous rGFP or pGFP gene. By "endogenous" in this context means present within the host cell. 
In this regard, an exogenous rGFP, pGFP, or variants thereof is incorporated into the genome such 
that the reporter gene is under the control of the endogenous promoter. These constructions are 
desirable for examining and modulating the full range of endogenous regulation, particuiariy promoter 
control elements (e.g., enhancers, inhibitory elements, etc.) other than promoter fragment 

Generating the endogenous-exogenous fusion construct may proceed In any number of ways 
depending on the organism used. In one preferred embodiment, homologous recombination 
mechanisms present in different organisms provides the basis for Inserting the exogenous reporter 
gene to ftxm the fusion construct. That Is, gene "knock-in" constructions are made, whereby an 
exogenous rGFP or pGFP gene as outlined herein is added, via homologous recombination, to the 
genome, such that the reporter gene is under the control of the endogenous promoter. Homologous 
recombination methods are well known in the art (see Westphal. et ai. (1997) Cunrent Biology 7: R530- 
R533 and references cited therein; Rothstein, R. (1991) Methods Enzymol. 194: 281-301; Kaur, R. 
(1997) Nudeic Acids Res. 251080-81; and Miller, J.iH., In Short Course in Bacterial Genetics: A 
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Laboratory Manual and Handbook for Escherichia coli. and Related Bacteria, Cold Spring Harisor 
Laboratory Press, Cold Spring Harbor, New York, 1992). These homologous recombination driven 
methods may use recA or recA type proteins to enhance the recombination process (see PCT 
IIS93/03868, hereby Incorporated by reference). 

In another prefisrred embodiment, the selection of the "knock ins" are done by FACS on the basis of 

incorporation of the rGFP or pGFP gene. Thus, in one aspect, a first homologous recombination 
event places a rGFP or pGFP gene into at least one allele of the cell genome. When the promoter is 
the IL-4 Inducible promoter, a cell type that exhibits IL-4 inducible production of at least gemniine e 
transcripts is preferred so that the ceils may be tested by IL-4 inducible reporter gene expression. 
Thaf Is, transformed cells are selected by FACS for reporter gene expression upon treatment with IL- 
4. Suitable ceils Include, but are not limited to, human cells and cell lines that show IL-4/13 Inducible 
production of germllne e transcripts. Preferably, once a first endogenous promoter has been 
combined with an exogenous reporter construct, a second homologous recombination event may be 
done, preferably using a second reporter gene different from the first, to target the other allele of the 
ceil genome, and tested as above. Generally, IL-4 induction of the rGFP or pGFP genes will Indicate 
the correct placement of the genes, which can be confimried via sequencing such as PCR sequencing 
or Southern blot hybridization. In addition, preferred embodiments utilize pre-screening steps to 
remove "leaky" cells, i.e., those showing constitutive expression of tlie rGFP or pGFP gene. 

In another preferred embodiment, endogenous exogenous fusion constmcts are made via site specific 
recombination. In these embodiments, ttie site specific recombination sequence, such as ioxP, is 
Inserted into tiie desired site(s), preferably by homologous recombination, aHhough random Insertions 
are possible with other vectors depending on cell type being used (e.g., phage Mu, retroviral vectors). 
Following generation of cells containing tfie site specific sites, a vector comprising the rGFP or pGFP 
and an appropriately placed toxP site is introduced into the cell. Expressing the ere recombinase, 
allows recombination between tiie IoxP sites on ttie two separate nucleic acids, thus resulting in 
insertion of the vector into the chromosomally located bxP site. 

As above, these cells are Induced with the appropriate inducer If the endogenous promoter of Interest 
is Inducible and then contacted with candidate agents. When the cells comprise fusion nucleic acids 
expressing candidate agents comprising rGFP or pGFP fusion proteins, or candidate agents 
expressed from a fusion nucleic acid comprising a first gene of interest, a separation sequence, and a 
second gene of Interest comprising a rGFP or pGFP, a reporter gene distinguishable from the rGFP or 
pGFP proteins Is used to monitor promoter modulation. This strategy allows simultaneous monitoring 
of tiie expression of the candidate agent and tiie promoter. 

In another preferred embodiment, the fusion nucleic acids comprising rGFP or pGFP and a weak 
promoter or no promoter are inserted into a host chromosome to scan for promoter elements on the 
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host chromosome. In a preferred embodiment, this may be done conveniently by using a viral 
backbone for constructing the fusion nucleic acids. For example, in bacteria, the phage Mu systems 
allow random insertions Into the host chromosome while In mammalian cells, retroviral viral vectors 
provide a suitable vehicle for inserting the fusion nucleic acids into the host chromosome. When 
retroviral vectors are used, SIN type vectors lacking viral promoters are preferred so that the reporter 
gene is transcribed or activated from endogenous promoters or promoter regulatory elements upon 
insertion of the viral DISJA Into the host chromosome. Expresston of rGFP or pGFP Indicates Insertion 
near an endogenous promoter. Identifying cells expressing the reporter gene upon treatment with 
inducers allow identification of promoters regulated by the inducing agent. Cells comprising these 
insertions are contacted with candkjate agents, for example, by expressing candidate nucleic acid or 
protefns ln the cells. Those agents nrKxIulating promoter activity are identified based on expression of 
the rGFP or pGFP reporter. 

In the endogenous-exogenous fusion constructs described above, the exogenous fusion nucleic add 
used to monitor promoter activity may comprise a rGFP or pGFP, or a fusion nucleic acid comprising a 
first gene of interest, a separation sequence, and a second gene of interest comprising a rGFP or 
pGFP. The latter construct allows identifying cells based on expression of two reporter/selection 
genes if the first gene of interest encodes reporter gene distinguishable from rGFP or pGFP. 

In addition, in a preferred embodiment, the fusion nucleic adds of the present inventton may also 
contain site specific recombination sites for deleting or rearranging the fusion nucleic acids when 
introduced into a cell. As described above, these sequences may comprise loxP or flp sites flanking 
the nucleic acid segment to be rearranged. As is well known in the art, the sites are placed in an 
appropriate orientation so that either deletion or rearrangement (i.e., Inversion) will occur upon contact 
of the sequences with a site specific recombinase. In a prefen*ed embodiment, the site specific 
sequences flank the rGFP or pGFP gene or flank the fusion nucleic acid comprising a first gene of 
interest, a separation sequence, and a second gene of interest comprising rGFP or pGFP. Thus, 
deletion or rean^ngement results in removal or rean^ngement that prevents operable linkage of ttie 
promoter to the fusion nudeic acid to be expressed, in another prefenred embodiment, the site 
specific sequences are orientated such that rean^ngement results in operable linkage of the promoter 
on the expression vector or the endogenous promoter when rearrangement is induced by the 
recombinase. This may be desirable when examining prorrK)ters active at specific stages in cell 
development or In examining cell lineage. 

In a preferred embodiment, the fusion nucleic acids of the present invention are used to identify 
candidate agents that alter a cellular phenotype. In these embodiments, the fusion nucleic acids of 
the present invention provide a way. among others, for detecting or monitoring a cellular phenotype, 
Indudng a phenotype being examined, and measuring synthesis of a gene of interest, such as 
candidate agents to be screened. 
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Accordingly, in one preferred embodiment tlie fusion nucleic adds of the present Invention find use In 
screens for cells with altered exocytosis. By "alteration" or "modulation" In relation to exocytosis Is 
meant a decrease or increase in amount or frequency of exocytosis in one cell compared to another 
cell or in the same ceii under different conditions. Often mediated by specialized cells, exocytosis Is 
vital for a variety of cellular processes, including neurotransmitter release by beurons, hormone 
release by adrenal chromaffin cells (e.g., adrenaline) and pancreatic p-cells (e.g., insulin), and * 
histamine release by B-cells. 

Disorders involving exocytosis are numerous. For example, inflammatory immune response mediated 
by mast ceils leads to a variety of disorders. Including asthma and allergies. Therapy for allergy 
remains limited to blocking mediators released by mast cells (e.g., antihistamines) and non-specific 
anti-Inflammatory agents, such as steroids and mast cell stabilizers. These treatments are only 
marginally effective In alleviating the symptoms of allergy. To identity cellular targets for dmg design 
or candidate effectors of exocytosis. the fusion nucleic acids expressing GFP fusion proteins (e.g., 
fused to random peptides) or expressing gene of interest comprising candidate agents may be 
Introduced into appropriate cells, for example mast cells, and selected for modulation of exocytosis by 
assaying for changes In cellular exocytosis properties under various conditions. For example, the 
cells may be examined in the presence or absence of physiological signals, such as Ca^^, ionophores, 
hormones, antibodies, peptides, drugs, antigens, cytoldnes, growth factors, membrane potentials, cell- 
cell contacts, and the like. In other aspects, the measurements are taken under the same conditions 
for different ceils. These cells are stimulated with appropriate inducer if exocytosis is triggered by an 
Inducing signal. Alternatively, cells with an conditional mutation for exocytosis events are used in 
screens for candidate agents affecting exocytosis regulators. 

In one prefenred embodiment, the cells used for screening may be engineered to be defective In 
exocytosis. For example, cells may be transformed with a fusion nucleic acid expressing a conditional 
gene product whose expression under restrictive conditions produces an exocytosis defect. 
Aitematively, the fusion nucleic acid may express a dominant effect protein affecting exocytosis. 
Examples of these types of genes of Interest are dynamin and Esel , proteins involved In endocytosis 
but which indirectiy affect exocytosis. Expression of temperature sensitive conditional mutants of 
dynamin or Ese1 in cells can induce endocytosis and exocytosis defects (Damke, H. et al. (1995) J. 
Cell Biol. 131: 69-80; Damke, H. et al. (1994) J. Cell Biol. 127: 915-934; Sengar. A.S. (1999) EMBO J. 
18: 1 159-11 71). Thus, In a preferred embodiment, tiie cell may comprise a fusion nucleic acid 
containing a conditional dynamin gene, a separation sequence, and reporter gene comprising rGFP or 
pGFP. Expression of dynamin gene under restrictive condition dlsmpts endocytosis, thus resulting in 
deficiency in exocytosis. Candidate agents are screened under the restrictive condition for activation 
of exocytosis. When candidate agents comprise GFP fusion proteins (e.g., random peptide or cDNA 
GFP fusions), or are expressed as a first gene of Interest, a separation sequence, and a second gene 
of interest comprising rGFP or pGFP, the reporter gene chosen Is distinguishable from ttie expressed 
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GFP. 

Assays for changes in exocytosis may comprise sorting cells In a fluorescence cell sorter (FACS) by 
measuring alterations of various exocytosis Indicators, such as light scattering, fluorescent dye uptake, 
fluorescent dye release, granule release, targeting and quantity of granule specific proteins (see for 
example, WO 99/54494), and capacitance measurements. Use of combinations of Indicators reduces 
background and increases speclfldty of the sorting assay. 

Exocytosis assays based on changes in light scattering properties, including use of forward and side 
scatter properties of the cells, are indicative of size, shape, and granule content of the cell. 
Multiparamter FAGS selections based on light scattering properQes of cells are well known in the art 
(see Paretti,'M. et al. (1 990) J. Phamnacol. Methods 23: 187-94; Hide, I. et al. (1 993) J. Cell Btol. 123: 
585-93). 

Assays based on uptake of fluorescent dyes reflect the coupling of exocytosis and endocytosis. ki 
these assays, the endocytosis levels indirectly reflect exocytosis levels since the cell attempts to 
maintain cell volume and membrane Integrity as the amount of cell membrane rapidly changes when 
secretory vesicles fuse with the cell membrane. Preferred fluorescent dyes Include styryl dyes, such 
as FM1-43, FM4-64, FM14-68, FM2.10, FM4-84, FM1-84, FM14-27, FM14-29, FM3-25, FM3-14. 
FM5-65, RH414, FM6-55, FM10-75, FM1-81. FM9-49, FM4-95, FM4-59, FM9-40, and combinattons 
thereof. Styryl dyes such as FM1-43 are only weakly fluorescent In water but highly fluorescent when 
associated with a membrane, such that dye uptake by endocytosis Is readily discemable (Betz, et al. 
(1996) Current Opinfon in Neurobiology, 6:365-371; Molecular Probes, Inc.. Eugene, Oregon, 
"Handbook of Fluorescent Probes and Research Chemicals", 6th Edition, 1996, particulariy, Chapter 
17, and mors particulariy, Section 2 of Chapter 17, hereby incorporated herein by reference). Useful 
solution dye concentration is about 25 to 100O- 5000 nM, with from about 50 to about 1000 nM being 
prefenred, and from about 50 to 250 being particulariy preferred. 

Exocytosis assays based on fluorescent dye release rely on release of dye that Is taken up passively 
or actively endocytosed by the cell. Release of dyes taken up by a cell results In decreased cellular 
fluorescence and presence of the dye In the cellular medium, thus providing two basis for measuring 
dye release. For example, styryl dye taken up into cells by endocytosis is released into the cellular 
media by exocytosis, resulting in decreased cellular fluorescence and presence of the dye in the 
medium. Another dye release assay uses low pH dyes, such as acridine orange, LYSOTRACKER™ 
red, LYSOTRACKERTM green, and LYSOTRACKER™ blue (Molecular Probes, supra) , which stains 
exocytic granules when dye is internalized by the cell. 

Altematlvely, the exocytosis assay relies on release of molecules contained In the granule. In one 
aspect, these may be proteins or detectable blomolecules, especially enzymes such as proteases and 



90 



3 02/090535 





Page 92 of 130 



WO 02/090535 



PCTAJS02/14766 



giyocosldases, released as part of the exocytic process. Many enzymes are inactive within the 
granule because of low pH In the vesicle but become activated wlien exposed to the extracellular 
media at physiological pH. Preferred granule enzymes include but are not limited to chymase, 
tryptase, arylasu!fatase A, p4iexosaminidase, p-D-galactosidase, and the like. Enzyme actMties are 
measurable using chromogenic or fluorogenic substrates. The generation of a signal via cleavage of 
a chromagenic or fluorogenic substrate is related to the amount of enzyme present, and thus a 
measure of exocytosls. If the exocytosis is inducbie, an Inducing signal is used. 

The fluorogenic substrate may be a substrate that preclpates upon action by the enzyme. For 
example, substrate forglucouronidase, such as ELF-97 gluoouronlde, precipitate through action 6f 
released enzyme. Other predpitating substrates are well known In the art and commercially available 
(see for example, Molecular Probes, supra, particularly Chapter 10, more partlculariy Section 2 or 
Chapter 10, and referenced related chapters). When the granule specific proteins comprises 
biological mediators released during exocytosis, such as serotonin, histamine, heparin, honmones, 
etc., these granule proteins may be identified using specific antibodies. 

Preferential staining of exocytic granules when vesicles fuse with the cell membrane provides an * 
additional assay for measuring exocytosis. Annexin V, which binds phospholipid phosphatidyl serine 
in a divalent ion dependent manner, specifically binds to exocytic granules present on the cell surfiace 
but fails to bind internally localized exocytic granules. This property of Annexin provides a basis for 
detennining exocytosis by the level of Annexin bound to cells. Cells show an increase In Annexin 
binding in proportion to the time and Intensity of the exocytic response. Annexin Is detectable directly 
by use of fluorescentiy labeled Annexin derivatives (e.g.. FITC, TRITC. AMCA. APG, or Cy-5 
fluorescent labels), or Indirectly by use of Annexin modified with a primary label (e.g., blotin), which is 
detected using a labeled secondary agent tiiat binds to tiie primary label (e.g., fluorescentiy labeled 
avidin). In general, changes of 25% from baseline are preferred, witii at least about 50% being more 
preferred, at least about 100% being partlculariy prefen^ and at least about 500% being especially 
preferred. Baseline as used herein means the amount of Annexin binding as compared to binding 
under a second state or different cell. 

Altematively, In a preferred embodiment the exocytosis Indicators are engineered into the cells. For 
example, recombinant proteins comprising fusion proteins of a granule specific, or a secreted protein, 
and a reporter molecule are expressed in a cell by transfbnning or transfecting the cells witii a fusion 
nucleic acid encoding the fusion protein. This is generally done as is known In tiie art. and will depend 
on the cell type. Generally, for mammalian cells, retroviral vectors, including tiiose of the present 
invention, are preferred for delivery of the fuslort nudeic acid. Prefenped reporter molecules Include, 
but are not limited to, Aequoria victoria GFP, RenWa muel/eri GFP, Renilia renlk>nnis GFP. PWosarcus 
gumeyi, GFP, BFP, YFP, and enzymes Induding luciferases (e.g., Renflla, firefly etc.) and p- 
galactosidases. Presence of the granule protein-reporter fusion construct on the cell surface or 



91 



3 02/090535 





Page 93 of 130 



WO 02/090535 



PCTAJS02/14766 



presence of secreted proteinnreporter fusion construct in the medium indicates the level of exocytosis 
In the cells. In one preferred embodiment, cells are transformed with vectors expressing a fusion 
protein comprising a granule spedfic protein, such as synaptobrevin (VAMP) or synaptotagmin, fused 
to a GFP reporter molecule. The cells are monitored for localization of the fusion protein to the cell 
membrane. By incorporating a separation sequence and a second gene of Interest comprising a 
distinguishable reporter or selection gene, cells expressing the fusion protein are readily selected. 
Moreover, the second gene of interest provides an Intemal standard to measure level effusion protein 
content in the cell. Candidate agents, for example candidate nudeic adds and candidate peptides, 
Introduced into these transfonned cells are tested for their ability to affect distribution of the fusion 
protein. _ Altematively, the fusion protein is detected, directly or Indirectly, using an antibody. 

In another preferred embodiment, the methods are used to examine cell cycle regulation. 
Complicated regulatory pathways control cell cyde progression. These regulatory molecules include, 
among others, cellular receptors, cyclins, cydin dependent kinases, cyclin dependent kinase Inhibitors, 
ceil dtvlston cycle phosphatases (CDC), ubiquttin llgases and ubiqultin mediated proteases, tumor 
suppressor proteins (e.g., cell cyde checkpoint regulators), and transcription factors. Cell cyde 
regulation Is implicated In tumor formatton and immune system regulation. The compositions of the 
present invention are used to Identify candidate agents producing an altered cell cycle phenotypO) 
such as activation or suppresston of cell cyde checkpoint In one aspect, the candidate agents are 
fusion nudek: acids expressing candidate peptides fused to rGFP or pGFP. These candidate agents 
are introduced Into ceils In the fomri of vectors, preferably retroviral vectors when mammalian cells are 
used. In anotiier aspect, the candidate agents are nudeic acids, peptides, cDNAs, and genomic 
DMAs expressed as a gene of interest When these candidate agents comprise peptides and 
proteins, the fusion nudeic add may furtiier comprise a separation sequence and a rGFP or pGFP to 
produce separate proteins and to monitor expresston of the candidate agent. 

In anotiier preferred embodiment, the fusion nucleic acids of tine present Invention is used to express 
cell cyde regulators or express mutants of cell cyde regulatory proteins which produce a cell cycle 
phenotype in the cells. In one aspect, the fusion nudeic acids may comprise a gene of Interest 
comprising a cell cycle regulator, which Induces a cell cycle phenotype when expressed. A separation 
sequence and a reporter gene, such a rCFP or pGFP allows nK)nltoring expression of the gene of 
interest When the candklate agent comprises rGFP or pGFP fusion proteins or when the candidate 
agent Is expressed from a fusion nucleic add comprising a first gene of interest, a separation 
sequence, and a second gene of Interest comprising rGFP or pGFP, a distinguishable reporter gene 
(e.g., blue fluorescent protein) is used to monitor expression of the ceil cycle regulator. Candidate 
agents are then introduced into tiie cells to identify those agents altering tiie induced cell cycle 
phenotype. 

The cell cycle may be examined by a variety of methods well known to ttiose skilled In tiie art (see for 
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example, US 2001/0003042, which ts expressly incorporated by reference). The assays permit 
detemnining whether cell cycle arrest occurs at a particular cell cycle stage (I.e., cell proliferation . 
assays) and at a specific cell stage (I.e., cell phase assays). By measuring or assaying one or more of 
these parameters. It is possible to detect alterations in cell cycle regulation and also alteration of 
different steps of the cell cycle regulatory pathway. By "alteration* and "modulation" as' used herein 
can include both increases and decreases In the cellular parameter being measured. In a preferred 
embodiment, the alteration results in a change In the cell cycle of a cell, l.e., proliferating cell amests in 
any one of the phases, or an amested cell moves out of Its arrested phase to progress into cell cycle 
as compared to another cell or the same cell under different conditions. /Vltematlvely, the progress of 
a cell through any particular phase may be altered; that is, there may be an acceleration or delay In the 
time for the cell to move through a particular growth phase. 

in a preferred embodiment a proliferation assay is used. By "proliferation assay" herein Is meant an 
assay that allows detemnining whether a ceil population is proliferating, i.e. replicating or not 
replicating. In one prefenred embodiment, the proliferation Is a dye exclusion assay. A dye exclusion 
assay relies on uptake of dye by cells and subsequent dilution of the dye by succeeding rounds of cell 
division. Generally, the introduction of dye may be done In several ways. Either the dye cannot 
passively enter the cells (e.g.. dye Is charged), and the cells are induced to tal<e up the dye. 
AKemativety. the dye passively enters the ceils and is subsequently modified to limit diffusion out of 
the cells. For example. Molecular Probes CellTracker dyes comprise chloromethyl derivatives of 
fluorescent compounds that freely diffuse into cells and are subsequently modified by glutathione S- 
transferase, which renders the dyes membrane impermeant. Suitable Inclusion dyes Include, but are 
not limited to, CellTracker dyes including, but not limited to CellTracker Yeltow-Green, CellTracker 
Green, CellTracker Orange, PKH26 (Sigma), and others well known in the art (see Molecular Probes 
Handbook, suora) . 

In another preferred embodiment, the proliferation assay is an antimetabolite assay. In general, 
antlrnetabolite assays are most useful when agents causing cell cycle anest at G1 or G2 resting 
phase Is desired. In an antimetabolite assay, the use of a toxic metabolite that will kill dividing cells will 
result in survival of only those cells that are not dividing. Suitable antimetabolites Include, but are not 
limited to. standard chemotherapeutic agents such as methotrexate, cisplatln. taxol. hydroxyurea, and 
nucleotide analogs (e.g.. AraC). In addition, antimetabolite assays may include the use of genes that 
cause cell death upon expression. 

The concentration at which the antimetabolite Is added will depend on the toxicity of the particular 
antimetabolite, and will be detemiined as Is known in the art The antimetabolite is added and the 
cells are generally incubated for some period of time; again, the exact period of time will depend on 
the characteristics and identity of the antimetabolite as well as tiie cell cycle time of the particular'celi 
population. Generally, tiie incubation time Is sufficient for at least one cell division. In a prefemed 
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embodiment, at least one prolfferalion assay is done, with more than one being prefen^d. 

In another preferred embodiment, either after or simultaneously with one or more of the proliferation 
assays outlined above, at least one cell phase assay Is done. By "cell phase" assay herein Is meant 
an assay that detemnlnes at which cell phase cell cycle arest takes place, i.e.; M, G1, S, or G2. 

In one prefen^ed embodiment, the cell phase assay Is a DNA binding assay. When Inside the cell, the 
dye binds to DNA, generally by Intercalation, although In some cases, the dyes can be either major or 
minor groove binding compounds. Thus, the amount of dye is directly correlated to the amount of 
DNA in ttie cell, which varies with cell phase; G2 and M phase cells have twice the DNA content of G1 
phase cells, and S phase cells have an intennedlate amount Suitable DNA binding dyes include, but 
are not limited to, Hoechst 33342 and 33258, acridlne orange, 7AAD, LDS, 751, DAPI, and SYTO 16 
(see Molecular Probes IHandbooi^, suora. Chapters 8 and 16 In particular). 

In general, the DNA binding dyes are added In concentrations ranging from about 1 pg/ml to about 5 
pg/mh The dyes are added to the cells and allowed to incubate for some period of time; the length of 
time will depend in part on the dye chosen. In one embodiment, measurements are taken 
Immediately after addition of the dye. The ceils are then sorted as outlined below, to create 
populations of cells that contain different amounts of dye, and thus different amounts of DNA; In this 
way. cells that are replicating are separated from those that are not As will be appreciated by those In 
the art, in some cases, for example when screening for anti-proliferation agents, cells with the least 
fluorescence (and thus a single copy of the genome) can be separated from those that are replicating 
since the replicating ceils contain more than a single genome of DNA. Alterations are detemrilned by 
measuring the fluorescence at either diflisrent time points or In different cell populations, and 
comparing the determinations to one another or to standards. 

In a preferred embodiment the cell phase assay is a cyclin destruction assay. In this embodiment 
prior to screening (and generally prior to the introduction of a candidate bioactive agent), a fusion 
nucleic acid is introduced to the cells. The fusion nucleic acid expresses a fusion protein comprising a 
cyclin destruction box and a detectable molecule. "Cyclin destruction boxes" are known in the art and 
are sequences that cause destruction via the ubk^ultinatlon pathway of destruction box containing 
proteins during particular cell phases. That is, for example, G1 cyciins may be stable during G1 phase 
but degraded during S phase due to the presence of a G1 cyclin destruction box. Thus, by linking a 
cyclin destruction box to a detectable molecule, for example green fluorescent protein, the presence 
or absence of the detectable molecule can serve to identify the cell phase of the cell population. In a 
preferred embodiment, multiple boxes are used, preferably each fused to distinguishable fluorescent 
proteins, such that detection of the cell phase can occur. 

A number of cyclin destmctlon boxes are known In the art. For example, cyclin A has a destruction 
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box comprising the sequence RTVLGVIGD wliile the destruction box of cyclin B1 comprises the 
sequence RTALGDIGN (Giotzer et al., Nature 349:132-138 (1991). Other destruction boxes are 
known as weli: YMTVSIIDRFIVIQDSCVPKKIVILQLVGVT (rat cyclin B); 

KFRLLQETiVIYIViTVSilDRFiVIQNSCVPKK (SEQ ID NO:57); RAILiDWLIQVQIVIKFRLLQE™YMTVS 
(mouse cycOn 81 ); DRFLQAQLVCRf<KLQWGiTALLU^SK (mouse cyclin 82); and ' 
MSVLRGKLQLVGTAAMLL (mouse cydln A2). These cyciin destruction boxes are operabiy iiniced to 
nucleic add encoding a detectabie nrx>iecuie to generate fusion proteins, as described above. 

In a pnefened embodiment, the cell cyde analysis further comprises a cell viability assay to ensure 
that a iaci( of cellular change is due to experimental conditions. Various suitable viability assays 
include, but are not limited to, light scattering, viability dye staining, and exdusion dye staining. 

in a preferred embodiment, the viability assay is a light scattering assay, which is well known In the art. 
Cells have particular forward and skle (90 degree) scatter properties representing the size, shape and 
granule content of the cells. Briefly, the scatter properties are affected by two parameters: skJe scatter 
of Di^IA condensation in dead and dying cells and the forward scatter affected by the state of 
membrane blabbing. Changes in the intensity of light scattering or the cell refractive index indicate 
alterations in viablity. In a prefen'ed embodiment, evaluating a live cell population of a particular cell 
type provides characteristic fonvard and side scatter properties for comparison to other ceil 
populations. 

in another preferred embodiment, the viability assay uses a viability dye. These dyes stain dead or 
dying ceils but not growing ceils. For example, Annexin V displays divalent ton dependent binding to 
the phospholipid phosphatidylserine, whose presence on the ceil surface is an eariy signal of 
apoptosis. Other suitable viability dyes include, but are not limited to, ethidium homodimer-1, DEAD 
Red, propidlum Iodide, SYTOX Green, etc., and others known in the art (see Molecular Probes, supra 
"Apoptosis Assay," pg 285, and Chapter 16, hereby incorporated by reference). Preferably, the 
viability dye concentration used is about 100 ng/ui to about 500 ng/ml, and more preferably, from 
about 500 ng/ml to about 1 ug/ml, most preferably alx)ut 500 ng/ml to about 1 ug/ml, and from about 1 
ug/ml to about 5 ug/ml being particuiariy prefemed. In a preferred embodiment, the dye is directly 
labeled. For example, Annexin may be labeled with a fiuorophore such as fluorescein isothlocyanate 
(FITC). Alexa dyes, TRITC, AlVICA, APC, tri-color Cy-5, and others known In the art In an alternative 
preferred embodiment, the viability dye is labeled with a first label (e.g., hapten or biotin), and a 
secondary fluorescent label is used to detect the first label. 

In another preferred embodiment, the viability assay Is a dye exclusion assay. Exclusion dyes rely on 
exdusion of tlie dye from living cells but entry into penneabte dead or dying cells. Generally, the 
exclusion dyes binds to Dl^ and fluoresces but fluoresces pooriy when not bound to DNA. 
Alternatively, exclusion dyes are detected using a secondary label. Preferred exclusion dyes indude, 
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but are not limited to ethldlum bromide, ethidium bromide homodimer-1. propidlum iodide, SYTOX 
Green, calceln AM, BBCECF AM, fluorescelne diacetate, TOTO, and TO-PRO (see Molecular Probes, 
supra) and others known In the art. These dyes are added to cells at a concentration of about 100 
ng/ml to about 500 ng/ml, more preferably, about 500 ng/ml to about 1 ug/ml, and most preferably, 
from about 0.1 ug/ml to about 5 ug/ml, with about 0.5 ug/mi being particularly preferred. In addition, 
other cell viability assays are used, including assays that measures extracellular (e.g., proteases) or 
intracellular (e.^., mitochondrial enzymes) enzymes of live and dead cells. 

In a preferred embodiment, at least one cell viability assay is run, with at least two different cell viability 
assays being pretend. When only one viabiPity assay Is run, a prefenred embodiment uses Hght 
scattering assays (both fonvard and side scatter). When two viability assays are run, prefened 
embodiments use light scattering and dye exclusion or light scattering and viability dye staining. In 
some cases, all three assays are used. 

Thus, in a preferred embodiment, ceil cycle assays comprise sorting cells In a FACS by assaying 
several different cellular parameters, Including, but not limited to, cell viability, cell proliferation, cell 
phase, and appropriate combinations thereof. The results from one or more of the assays are. 
compared to cells not exposed to the candidate bioactlve agent. 

In the present Invention, assays for other cellular assays are combined with the cell cyde assay. 
These Include cellular parameters of cell shape, redox state, DNA content, nucleic add sequence, 
chromatin stmcture, RNA content, total protein, antigens, lipids, surface proteins, Intracellular 
receptors, oxidative metabolism, DNA synthesis, degradation. Intracellular pH, etc. in a preferred 
embodiment, each of these measurements is detemilned simultaneously or sequentially using FACS 
(i.e., multiparameter FACS). By using wore than one parameter to detect the cell cycle, background Is 
reduced and spedflcfty is increased. In one aspect, the cells are sorted at high speeds, for example 
greater than about 5.000 sorting events/s, with greater than about 10,000 sorting events/s being 
preferred, and greater than about 25,000 sorting events/s being particularly prefen^d. with speeds of 
greater than about 50,000 to 1 00,000 being espedally prefenred. 

In another preferred embodiment, the present methods are useful in cancer applications. The ability 
to rapidly and specifically kill tumor cells is a comerstone of cancer chemotherapy. In general, using 
the methods of the present invention, the fusk>n nudeic adds of the present Invention can be 
introduced into any tumor cell (primary or cultured) to Identify bloactive agents that can Induce 
apoptosis, cell death, loss of cell division, or deceased cell growth. The methods of the present 
invention can be combined with other cancer therapeutics (e.g.. drugs or radiation) to sensitize the 
ceils, and thus induce rapid and specific apoptosis, cell death or descreased growth after exposure to 
secondary agent. Similarly, the present Invention may be used in conjunction with known cancer 
therapeutics to screen for agonists to make the therapeutic treatments more effective or less toxic. 
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This Is particularly preferred when the chemotherapeutic agent Is difficult or expensive to produce^ 
such as taxol. 

In a preferred embodiment, the present invention Is used to identify candidate agents that alter the 
transfonned phenotype of cancer cells, it is well icnown that oncogenes such as v-At)l. v-Src, v-Ras, 
and others induce a transformed phenotype leading to abnomnal cell growth when transfected Into 
certain cell types. Loss of growth control Is also a major problem associated with metastasis of 
transfonned cells. Thus, In a preferred embodiment, susceptible, non-transfomned cells can be 
transformed with these oncogenes, and then candidate agents introduced into these cells to select for 
bioactive agents which reverse or connsct the transformed state. 

One of the Identifying features of oncogenic transfomfiation is a loss of contact Inhibition and the ability 
to grow in soft-agar. This characteristic provides one method for Identifying candidate agents that 
alter the transformed phenotype of tumor cells. In this assay, transfomiing viruses are constructed 
containing v-AbI, v-Src, or v-Ras, a separation sequence, and a puromycin selection gene. Following 
introduction of the viral constructs into NiH3T3 cells, the cells are subjected to puromycin selection. 
The NIH 3T3 cells hypertransform and detach from the plate, which allows their removal by washing 
with fresh medium. This feature can serve as a basis for a screen since cells that express a bioactive 
agent altering this phenotype will remain attached to the plate and fomi colonies. 

Similariy, the growth and/or spread of certain tumor cell types is enhanced by stimulatory responses 
from growth fectors and cytoldnes {e.g., PDGF, EGF, Heregulin, and other), which bind to receptors 
on the surfaces of specific tumors. In a prefenred embodiment, the present invention Is used to 
Identify candidate agents capable of blocking the ability of growth factors or cytokines to stimulate the 
tumor cell. Tliis screen comprises Introducing the fusion nucleic adds expressing candidate agents 
followed by selecting for agents that block the binding, signaling, phenotypic and/or function responses 
to these tumor ceils to the subject growth factor or cytokine. 

Similariy, the spread of cancer cells by tumor cell invasion or metastasis presents a significant 
problem In success of cancer therapies. The ability to restrict or inhibit the migration of specific tumor 
cells would provide a significant advance in the therapy of cancer. Tumor cells known to have high 
metastatic potential can have candidate agents Introduced into them, and agents selected that inhibit 
migrative or invasive activity of the tumor cells. The present Invention provides compositions for 
following the migration of cells, for example by expressing rGFP or pGFP in cells and examining 
invasive activity. Altematively, the rGFP or pGFP fusion proteins are used to monitor cellular 
components Involved in cell migration, such as cellular actin or focal adhesion proteins. Candidates 
agents may be introduced Into these cells to identify agents that affect the Invasive or metastatic 
properties of the tumor cells. These and otiier particular applications of inhibition of metastatic . 
phenotype could allow specific Inhibition of metastasis. This may Include, for example, candidate 
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agents that upregulate metastasis suppressor gene NM23, which codes for a dlnucleoside 
diphosphate Idnase. Peptides that counteract oncogenes, such s w-Mos, v-Raf, a-Raf, v-Src, v-Fes, 
and v-Fi\1S , or inhibit the release or activaty of matrix metatioproteinases would also act as anti- 
metastatic agents. 

In a preferred embodiment, the present invention finds use In immunologic and Inflammatory 
applications. Selective regulation of T lymphocytes is a desired goal for modulating Immune mediated 
diseases. Thus, candidate agents of the present invention can be introduced into spedfic T-cell 
subsets (TH1, TH2. CD4+, CD8+, etc.) and examined for characteristic responses, for example 
cytoklne-generation, cytotoxicity, proliferation, and others. Agents can be selected that Increase or 
descrease tiie known T-oell physiologic response. For monitoring tiiese responses, tiie present 
Invention may also be used as markers of physiologic response, for example by fusing rGFP or pGFP 
operably fused to promoters of cytokines ttiat are regulated as part of the immune response. 
Candidate agents that affect regulation of tiie cytokine promoters can be screened on basis of 
expression of rGFP or pGFP. These approaches will be useful In any number of conditions. Including: 
1) autoimmune disease states where inducing tolerant state Is desirable; 2) allergic diseases where 
decreasing tiie stimulation of IgE producing cells Is desirable (e.g., blocking release from T-cell 
subsets of specific B-cell stimulating cytokines that induce switch to IgE production); 3) transplantation 
of oigans where it is desirable to induce selective Immunosuppression or prolong functioning of the 
transplanted organ; 4) in lymphoproliferative states for inhibiting growtii or to sensitize a specific T-ceil 
tumor to chemotherapy and/or radiation; 5) in tumor surveilllance for inhibiting the elimination of 
cytotoxic T-cells via Fas llgand bearing tumor cells; and 6) in T-cell mediated autoimmune or 
inflammatory diseases such a rheumatoid artfiritis, multiple sclerosis, inflammatory bowel disease, 
myastiienia gravis, systemic lupus erytiiematosus, early onset diabetes, etc. 

In a prefen'ed embodiment, the present Invention Is applicable in selective modulation of B-cell 
response. Activation of B>cells initiates various facets of humoral Immunity, including immunoglobulin 
synttiesis and antigen presentation by B-cells. Activation is mediated by engagement of the B-ceil 
receptor (BCR), for example by binding of anti-lgi\4 F(ab') fragments. Activation Induces several signal 
transduction pathways leading to various B cell responses, including apoptosis, expression of cell 
surface mari<er CD69. and modulation of IgH promoter activity. Thus, in a prefen-ed embodiment; 
candidate agents comprising tiie fusion nucleic acids of the present Invention are introduced into 
appropriate B-cell lines, such as Ramos Human B-celi lines, M12.4 etc., to Identify candidate agents 
affecting tiie signaling pathways activated by B-cell receptor engagement. The assay may comprise 
detemiining tiie level of CD69 ceil suri'ace mari<er (e.g., by fluorescently labeled anti-CD69 antibody 
and FACS selection of ceils expressing high levels of CD69) or Inhibition of apoptotic pathway (I.e., 
inhibition of cell death) following receptor activation, in one aspect the candidate agents may be 
fusion nucleic acids expressing candidate peptides fused to rGFP or pGFP. These candidate agents 
are introduced Into ceils In the fomn of vectors, preferably retroviral vectors when mammalian ceils are 
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used. In another aspect, the candidate agents are nucleic adds, peptides, cDNAs, and genomic * 
DMAs expressed as a gene of Interest using the luslon nucleic adds described herein. 

In another aspect, the present invention finds use as Indicators of B-cell receptor mediated signal 
transduction. An IgH promoter may be operably linked to a rGFP or pGFP, which allowte monitoring of 
BCR activation by providir^ a measure of IgH promoter activity. For example, the promoter reporter 
construct may comprise a fusion nucleic add comprising a first gene of interest comprising a HBEGF, 
a Type 2A separation sequence, and a second gene of interest comprising rGFP or pGFP fused to a 
PEST sequence. Candidate agents are introduced into cells carrying this construct to identify agents 
that activate or suppress BCR mediated signal transduction, as reflected in changes in IgH promoter 
activity. Cells that survive exposure to diptheria toxin and/or have low levels of GFP expression will 
have low IgH promoter activity. Expression of the candidate agents may be under the control of an 
inducible promoter, such as tetP, thus limiting any detrimental effect of constitutively expressing 
candidate agents. 

In a preferred embodiment, the present invention Is used In infectious disease applications. Viral 
pathogens can produce chronic or acute Infections leading to severe, disabling health effects, and 
death. Pathogenic vimses, such as human immunodeficiency virus, cytomegalovirus, leulcemia 
viruses, hepatitis virus, herpes virus, among others are epidemic throughout the worid. There is a 
need for understanding the infection process and identifying agents affecting propagation of the virus. 
In a prefen-ed embodiment, the present invention is used to follow and tracl^ virus infection of cells. 
This Is done in a number of ways. In one aspect, rGPF or pGFP are fused to a protein synthesized by 
the pathogenic organism. For vimses, fusions may be made to viral capsid or envelope proteins since 
these proteins can tolerate substantial nruxliflcations and still be incorporated into the viral particle. 
The fusions allow monitoring of infected cells, tracking of synthesized viral particle in the cell, and 
detemilning the presence of viral particles extmded from the cell. Other viral structural proteins 
suitable for fusions include the tegument proteins, which fonns a structure generally located between 
the capsid and the envelope. Alternatively, the fusion nucleic acid comprising rGFP or pGPF gene is 
Inserted Into the viral genome, for example by homologous recombination, such that expression Is 
driven by a viral promoter. Viral infection of cells results In expression of the reporter molecule, thus 
allowing monitoring of the infection process. 

Analogously, cell lines are constructed in which a viral promoter is operably linked to a fusion nucleic 
add comprising rGFP or pGFP. Upon Infectfon of the cell by a vims, the viral promoter is activated 
resulting in fluorescent reporter gene expression. Consequently, expression of the GFP provides a 
measure of viral infection. A variety of viral promoters may be used. These may Indude immediate 
eariy gene promoter of many vimses or the viral promoters present on the long tenminal repeats of 
pathogenic retrovimses (e.g., HIV). Cellular promoters modulated by viral infection may also be used. 
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The modiffed viruses and cells containing the described fusion nucleic acids are then used to identify 
candidate agents capable of affecting the Infection process, for example, agents capable of Inhibiting 
viral synthesis. Candidate agents are contacted with the ceils and Infected with the modified viruses. 
Candidate agents that tower the amount of virus produced or affect the promoters regulated by the 
Infection process can be Identified. When the candidate agents are the part of a fuslon'nucleic acid 
comprising rCFP or pGFP, the reporter gene selected for tracking and examining the Infection 
process is a reporter distinguishable frorn rGFP or pGFP. 

Many cellular pathogens are known to exist intraceliularly. For example, mycobacteria, rickettsia, 
salmonella. Pneumocystis, yersinia, lelshmania, Trypanosoma cruzl, and the like can persist and 
replicate within cells such as marcrophages and lymphocytes. In a manner similar to tagging 
pathogenic viruses described above, the fusion nucleic adds comprising rGFP or pGFP are used to 
marie or tag the pathogenic organism. As with viruses, marking or tagging these non-virat entities may 
be done In a number of ways. In one aspect, a fusion nucleic acid comprising a promoter active within 
the organism, such as a promoter that regulates expression of a protein required for Infection, is 
operably linked to fusion nucleic acids comprising rGFP or pGFP. These constructs are Inserted into 
the organism by various methods, for example by homologous recombination. Altematively, the 
expression vectors may be maintained extrachromosomally by expression of a selection gene 
followed by treatment of the organism under selection condltbns. 

These mari<ed or tagged organisms are used to infect appropriate cells or host organisms. The 
infection process may be tracked by monitoring expression of the reporter gene. Ceils harboring the 
mari<ed pathogens are readily identified. Candidate agents are contacted with these cells to identify 
agents that affect the Infection process. Bioactive candidate agents may be selected for their ability to 
eliminate or kill the intracellular organisms, similar to the antibiotic peptide maganin. Other assays 
Include selecting for agents that prevent Initial infection, confer resistance to infection of the host cell, 
inhibit replication of the pathogen, or increase susceptibility of infected cells for destruction by host 
defense mechanisms (e.g, Immune response). 

For example, some viruses use cellular receptors and receptor complexes to bind and enter cells. For 
Instance, HIV bfnds CD4 complexes, coroviruses bind CD13, and measles virus binds CD44 receptors 
to Infect cells. It Is desirable to identify agents that block viral infection in cells pennissive for viral 
Infectton. In a specific example, it Is known ttiat entry of HIV-1 into cells requires CD4 and a co- 
receptor, which can be one of several seven transmembrane G-protein coupled receptors. In the case 
of macrophages, the co-receptor required for HIV infection is CCR-5. Individuals homozygous for a 
mutant allele of CCR-5 are resistant to HIV infection, and natural ligands of CCR-5, for example. CC 
chemoWnes RANTES. MIP1a and MIP1b can confer CDS* mediated resistance to HIV Infection. 
Thus, agents that inhibit interaction between ttie CD4/CC5 receptor complex and HIV are desirable. 
In a preferred embodiment, the agents are inserted Into tiie membrane and displayed extracellulariy. 
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In one aspect, a library of candidate peptides may comprise an epitope tagged, glycine-serine tethered 
peptides, which is a library of cydlzed peptides of the general sequence CXXXXXXXXXXC or C-(X)„- 
C, where C Is cystein and X Is any amino acid. Cells expressing the CD4/CCR-5 complex are 
contacted with a library of the candidate peptide, and Infected with the vinjses described above. Cells 
that are not Infected with viruses are identified by FACs and the candidate agent confening resistance 
to Infection identified. These agents are then further assayed for their ability to Inhibit viral infection. 
The candidate peptides' may be also be displayed on rGFP or pGFP scaffolds or expressed from a 
fusion nucleic comprising a first gene of interest comprising a candidate peptide, a separation 
sequence, and a second gene of interest comprising rGFP or pGFP. 

In another preferred embodiment, the present invention Is used to find candidate agents affecting 
separation sequences used In various biological processes, including, but not limited to, cell death, 
viral pathogenesis, expression of cellular genes resulting In ceil disease states, processing of cellular 
proteins, mechanism of action of bacterial toxins (e.g., botullnum toxin), and the like. In one aspect, 
when the separation sequences are protease reoognlUon sequences, the fusion nucleic adds of the 
present Invention are used to express substrates to detect protease activity, as described above. The 
substrates comprise fusions of protease recognition sequences to rGFP or pGFP. In another 
emtxxiiment, the protease substrates are based on rGFP or pGFP linked to another fluorescent 
protein via a protease recognitton sequence to generate substrates capable of undergoing FRET. 
Preferably, the substrates are oodon optimized for the organism in which the substrates are expressed 
to maximize the signal. The protease site sequences Include, among others, those recognized by 
caspase proteases; viral proteases Involved In polyprotein processing, for example the HIV protease; 
proteases of bacterial toxins (e.g. botullnum toxin); proteases that process cellular proteins, especially 
those related to disease states (e.g., secretase processing of p-amyloid and Notch proteins; 
cathepsins, etc.); proteases regulating cell adheston (e.g., metalloproteases associated with 
extracellular matrix); proteases Involved In blood coagulation, inflammation and would healing; tumor 
cell associated proteases ; and the like. Preferably, the protease substrates are codon optimized for 
efficient expression in subject organism, importantiy. these screens are also adaptable to Identilying 
candidate agents affecting IRES and Type 2A separation sequences. Of particular interest are 
separation sequences involved in disease states, such as IRES elements involved in viral 
pattiogenesis (e.g., hepatitis C virus). 

In a preferred embodiment, the present invention finds appiications In drug resistance or drug toxicity 
mechanisms. Development of drug resistance in a variety of ceil types limits ttie effectiveness of drug 
ttieraples. For example, multi drug resistance in tumor cells leads to selection of drug resistant tunror 
celts, which lead to relapse, mori^idity, and Increased mortality In cancer patients. In one aspect. It is 
desirable to prevent or limit dmg resistance In cells to Increase or prolong effectiveness of 
chemottierapeutic agents. In a prefenred embodiment, fiisbn nudeic adds expressing candidate 
agents are introduced into drug resistant cells, eitiier primary or cultured. Agents are identified ttiat 



101 



wo 02/090535 



PCT/US02/14766 



confer drug sensitivity wlien ceils are exposed to a drug or to combinations of drugs. Cells may be 
selected based on onset of apoptosis, changes In membrane permeability, release of intracellular 
ions, and release of fluorescent markers. Cells In which muitldoig resistance involves transporters 
can be preloaded with fluorescent transporter substrates, and selection carried out for candidate 
agents whicii block nomnat efflux of fluorescent drugs from these cells. Screening of candidate agents 
affecting drug resistance Is well suited for poorly characterized mechanisms of resistance. Identifying 
candidate agents that increase susceptibility of these cells to drugs may provide a basis for identifying 
the cellular targets and for rational design of peptide inhibitors of drug resistance pathways. 

in another aspect, the present invention is used to identify cellular targets that regulate synthesis of 
drug resistance proteins at the transcriptional or transiatlonal levels. In a preferred embodiment, 
promoters of drug resistance proteins, such as multi-drug resistance transporters, are operably linlced 
to fusion nucleic acids comprising rGFP or pGFP. Candidate agents, such as a library of small 
molecules, random peptides, cDNAs, or genomic DMAs, are introduced into cells and screened for 
their ability to regulate drug resistance protein gene transcriptbn. Candidate agents tiiat activate or 
InhitHt transcription are identified and used to design other inhibitors or identify the cellular targets of 
the candidate agents. 

In anotiier preferred embodiment, tiie fusion nuclete acids of tiie present invention are used to confer 
a drug resistance phenotype in cells by expressing drug resistance proteins, for example multi-drug 
resistance transporters (e.g., P-glycoprotelns). The drug resistance protein may be expressed from a 
fusion nucleic acid comprising a first gene of interest, a separation sequence, and a second gene of 
interest where at least one of the genes of interest is the drug resistance gene and tiie other gene of 
interest is a reporter gerie, such as rGFP or pGFP. The GFP reporter allows for monitoring 
expression of the drug resistance gene. Cells expressing these fusion nucleic acids are contacted 
witii candidate agents and screened for their ability to reduce drug resistance (i.e., increase drug 
sensitivity). 

In a preferred embodiment, the present invention is useful in Identifying candidate agents that bind 
specific cells, tissues and organs. Cells expressing libraries of candidate agents comprising rGFP or 
pGFP are contacted witii cells or introduced Into an organism. Candidate agents that bind to specific 
cells are selected, for example by FACS. These bioactive candidate agents are useful for targetirig 
coupled antibodies, enzymes, dmgs, imaging agents, and tiie like to particular cells or organs. 

In a preferred embodiment, the present invention provides compositions and methods utilizing rGFP 
and/or pGFP and a chip device comprising integrated photodetectors at individual lod. The method 
may be practiced with any'suitable chip device tiiat includes an electronic circuit capable of reading 
the sensed signal generated by each photodetector and generating output data signals therefrom. 
The output data signals are indicative of tiie light emitted, due to the presence of rGFP or pGFP, dt tiie 
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various loci. As will be appreciated by those In the art. any assay that evaluates binding Interactions 
can utilize the present invention. Examples of binding interactions include protein interaction domains, 
receptors and iigands, drugs and drug targets, enzymes and inhibitors, nucleic acid sequences and 
nucleic acid binding proteins, and binding of candidate agents, for example when expressed on a cell 
surface, to any binding partners above. 

It Is understood by the sl<Illed artisan that the steps for constmcting the fusion nucleic acids, retroviral 
libraries, and transfonned cells can be varied according to the options provided herein. It Is also 
understood, however, that the methods and examples in no way limit the true scope of the Invention. 
Those skilled in the art may modify according to the siclll In the art 

The following examples serve to more fully describe the manner of using the above-described 
Invention, as well as to set forth the best modes contemplated for canying our various aspects of the 
invention. It is under stood that these examples in no way serve to limit the true scope of the present 
invention, but rather are presented for illustrative purposes. All references cited herein are 
Incorporated by reference In their entirety. 
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EXAMPLES 
Example 1 

Vector Constaiction and Expression in Mammalian Cells 

Retroviral constructs are based on p96.7, a retroviral vector described in Lorens, J. et al. (2000) Mol. 
Tfier. 1: 438-47. Tlie pCGFP vector carries a composite CMV promoter fused to tlie transcriptional 
start site of the MMLV R-U5 region of the LTR; an extended paclcaging sequence; deletion of the 
MMLV gag start ATG; a mutttple cloning region containing a EGFP (an Aequona Victoria GFP variant 
codon optimized for expression in human cells; Ciontech, Palo Alto, CA); and a Kozak consensus start 
sequence, described in Kozak (1986) Cell 44: 283-292. The vector used to express flag tagged 
EGFP. pEf, was made by ligation of cFlag tag oligonucleotides onto XhoI/NotI digested p96 JEGFP, 
which is a vector Identical to pCGFP except that it has additional restriction sites in the open reading 
frame of EGFP, resulting in 8 non-optimized codons. Oligonucleotides used to make flag tagged 
constmct Ef are cFlag Forward, 5'- 

TCGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACAAGGAGGAGGCCG 
CCAAGGCCGACTACAAGGACGACGACGACAAGTAGGCCCGTGAGGCCCTAAGC; and cRag 
Reverse, 5'- 

GGCCGCTTAGGGCCTCACGGGCCTACTTGTCGTCGTCGTCCTTGTAGTCGGCCTTGGCGGCCT 
CCTCCTTGTACAGCTCGTCCATGCCGAGAGTGATCCCGGCGGCGGTCACGAAC. 

pR and pP are retroviral expression vectors comprising Renilia mueilerland Ptilosarcus gumeyl GFPs 
codon optimized for expression in human ceils (containing 9 and 1 1 non-optimteed codons, 
respectively). Each has a Kozak consensus start sequence and a backbone vector sequence 
identical to that of p96.7EGFP. These vectors were made by annealing and ligating 20 synthetic 
oligonucleotides (R1-R20 for Renilia muelieri, P1-P20 for Ptiiosarcus gumeyi) followed by amplification 
of the fragments by PCR, thus creating DNA finagments with optimized codon sequences shown in 
Figures 2 and 3. The ampirfled products were digested with EooRI/Notl and cloned into EcoRI/Noti 
digested 96.7EGFP vector. Synthetic oligonucieotWes used in construction of these vectors are as 
follows: 
R1,5'- 

GCAGATCCTGAAGAACACCTGCCTGCAGGAGGTGATGAGCTACAAGGTGAACCTGGAGGGCAT 

CGTTAACAA; 

R2. 5*- 

CCACGTGTTCACCATGGAGGGCTGCGGCAAGGGCAACATCCTGTTCGGCAACCAATTGGTGCA 

GATCCGCGT; 

R3,5'- 

GACCAAGGGCGCCCCCCTGCCCTTCGCCTTCGACATCGTGAGCCCCGCCTTCCAGTACGGCAA 

CCGTACGTT; 

R4,5*- 
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CACCAAGTACCCCMCGACATCAGCGACTACTTCATCCAGAGCTTCCCCGCCGGCTTCATGTAC 

GAGCGCAC; 

R5. 5* - 

CCTGCGCTACGAGGACGGCGGCCTGGTGGAGATCCGCAGCGACATCAACCTGATCGAGGACAA 

GTTCGTGTA; 

R6,5'- 

CCGCGTGGAGTACAAGGGCAGCAACTTCCCCGACGACGGGCCCGTGATGCAGAAGACCATCCT 

GGGCATCGA; 

R7. 5' - 

GCCCAGCTTCGAGGCCATGTAGATGAACAACGGCGTGCTGGTGGGCGAGGTGATCCTGGTGT^ 

CAAGCTTAA; 

R8,5'" * 

CAGCGGCAAGTACTACAGCTGCCACATGAAGACCCTGATGAAGAGCAAGGGCGTGGTGAAGGA 

GTTCCCCAG; 

R9,5'- 

CTACCACTTCATCCAGGACCGCCTCGAGAAGACCTACGTGGAGGACGGCGGCTTCGTGGAGCA 
GCACGAGAC; 

R10, 5' -CGCCATCGCCCAGATGACCAGCATCGGCAAGCCCCTGGGATCCCTGCA; 
R11,5'- 

TGCAGGGATCCCAGGGGCTTGCCGATGCTGGTCATCTGGGCGATGGCGGTCTCGTGCTGCTCC 

ACGAAGCCGCCGTCCTCCACG; 

R12,5'- 

TAGGTCTTCTCGAGGCGGTGGTGGATGAAGTGGTAGCTGGGGAACTCCTTCACCACGCCCTTG 

CTCTTCATC; 

R13,5'- 

AGGGTCTTCATGTGGCAGCTGTAGTACTTGCCGCTGTTAAGCTTGTACACCAGGATCACCTCGC 

CCACCAGC; 

R14.5*- 

ACGCCGTrGTTCATGTACATGGCCTCGAAGCTGGGCTCGATGCCCAGGATGGTCTTCTGCATCA 

CGGGCCCG; 

R15. 

TCGTCGGGGAAGTTGCTGCCCTTGTACTCCACGCGGTACACGAAGTTGTCCTCGATCAGGTTGA 

TGTCGCTG; 

R16, 5'- 

CGGATCTeCAGCAGGCCGCCGTCCTCGTAGCGCAGGGTGCGCTCGTACATGAAGCCGGCGGG 

GAAGCTCTGG 

R17,5*- 

ATGAAGTAGTCGCTGATGTCGTTGGGGTACTTGGTGAACGTACGGTTGCCGTACTGGAAGGCG 
GGGCTCACG; 
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R18,5*- 

ATGTCGAAGGCGAAGGGCAGGGGGGCGCCCTTGGTCACGCGGATCTGCACCAATTGGTTGCC 

GAACAGGATG; 

R19,5'- 

TTGCCCTTGCCGGAGCCCTCCATGGTGAAGACGTGGTTGTTAACGATGCCCTCCAGGTTCArc^ 
TGTAGCTC; 

R20. 5'-ATCACCTCCTGCAGGCAGGTGTTCTTCAGGATCTGC; 
P1.5'- 

CAACGTGCTGAAGAACACCGGCCTGAAGGAGATCATGAGCGCCAAGGCCAGCGTGGAGGGCAT 



CGTTAACAA; - 
P2, 5'- 

CCACGTGTTCAGGATGGAGGGCTTCGGCAAGGGCAACGTGCTGTTCGGCAACCAATTGATGCA 

GATCCGCGT; 

P3, 5'- 

GACCAAGGGCGGCCCCCTGCCCTTCGCCTTC6ACATCGTGAGCATCGCCTTCCAGTACGGCAA 

CCGTACGTT; 

P4.5'- 

CACCAAGTACCCCGACGACATCGCCGACTACTTCGTGCAGAGCTTCCCCGCCGGCTTCTTCTAC 

GAGCGCAA; 

P5,5'- 

CCTGCGCTTCGAGGACGGCGCCATGGTGGACATCCGCAGCGACATCAGCCTGQAGGACGACAA 

GTrCCACTA; 

P6,5'- 

CAAGGTGGAGTACCGCGGCAACGGCTTCCCCAGCAACGGGCCCGTGATGCAGAAGGCCATCCT 

GGGCATGGA; 

P7. 5'- 

GCCCAGCTTCGAGGTGGTGTACATGAACAGCGGGGTGCTGGTGGGCGAGGTGGACCTGGTGT 

AGAAGCTTGA; 

P8,5'- 

GAGCGGCAACTACTACAGCTGCCACATGAAGACCTTCTACCGTTCGAAGGGCGGCGTGAAGGA 

GTTCCCCGA; 

P9, 5'- 

GTACCACTTGATGCACCACCGCGTCGAGAAGACCTAGGTGGAGGAGGGCAGCITCGTGGAGCA 
GCACGAGAC; 

PI 0. 5'-CGCCATCGCCCAGCTGACCAGCATGGGCAAGCCCCTGGGATCCCTGCA; 
P11.5*- 

TGCAGGGATCCCAGGGGCTTGCCGATGGTGGTGAGCTGGGGGATGGGGGTCTCGTGCTGCTC 

CACGAAGCTGCCCTCCTCCACG; 

P12,6'- 
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TAGGTCnCTCGAGGCGGTGGTGGATGAAGTGGTACTCGGGGMCTCCTTCACGOJGCCCTTC 

GAACGGTAG; 

P13.5'-. 

AAGGTCTTCATGTGGCAGCTGTAGTAGTTGCCGCTCTCAAGCTTGTACACCAGGTCCACCTCGC 

CCACCAGC; 

P14,5'- 

ACGCCGCTGTTCATGTACACCACCTCGAAGCTGGGCTCCATGCCCAGGATGGCCTTCTGCATCA 

CGGGCCCG; 

P15, 5'- 

TTGCTGGGGAAGCCGTTGCCGCGGTACTCCACCTTGTAGTGGAACTTGTCGTCCTCCAGGCTG 

ATGTCGCTG; 
P16,5'- * 

CGGATGTCCACGATGGCGCCGTCCTCGAAGCGCAGGTTGCGCTCGTAGAAGAAGCCGGCGGG 

GAAGCTCTGC; 

P17.5'- 

ACGAAGTAGTCGGCGATGTCGTCGGGGTACTTGGTGAACGTACGGTTGCCGTACTGGAAGGCG 

ATGCTCACG; 

P18.5'. 

ATGTCGAAGGCGAAGGGCAGGGGGCCGCCCTTGGTCACGCGGATCTGCATCWVTTGGTTGGCG 

AACAGCACG; 

P19, 5* - 

TTGCCCTTGCCGAAGCCCTCCATGCTGAACACGTGGTTGTTAACGATGCCCTCCACGCTGGCCT 
TGGCGCTC; and 

P20, 5' - ATGATCTCCTTCAGGCCGGTGTTCTTCAGCACGTTG. 

Annealed, ligated fragments were PCR amplified with respective primers: 
RlbnAfard, 5"- 

GATCATAGAATTCGCCACCATGGGCAGCAAGCAGATCCTGAAGAACACCTGCCTG; P forward, 5'- 
GATCATAGAATTCGCCACCATGGGCAACCGCAACGTGCTGAAGAACACCGGCCTG; and 
R and P reverse, 5*- 

ATGATCGCGGCCGCTACACCCACTCGTGCAGGGATCCCAGGGGCTTGCCGATG. 

C temnlnal flag tags were added to these GFPs by ligation of annealed oligonucleotides: 
nn/pgFlag Forward, 5' - 

GATCCCTGCACGAGTGGGTGGAGGAGGCCGCCAAGGCCGACTACAAGGACGACGACGACAAG 
TAGGCOCGTGAGGCCCTAAGC; and 
mn/pgFlag Reverse, 5' - 

GGCCGCmGGGCCTCACGGGCCTACTTGTCGTCGTCGTCCTTGTAGTCGGCCTTGGCGGCCT 
CCTCCACCCACTCGTGCAGG 
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Into BamHI/NotI digested vectors to create Rf and Pf . 

pRcDNA was made by PGR amplifying the wild-type R. mae/Zen cDNA gene from pET-34 Renllla 

muelleh GFP (Prolume Ltd., Pittsburg, PA) with primers: Forward, 5' - 

GATCATGAATTCGCCACCATGAGTAAACAAATATTGAAGAACACT; 

Reverse. 5' - TAGATCGCGGCCGCTTAAACCCATTCGTGTAAGGATCCTAGTGG; 

and cloning Into the EcoRI/NotI sites of p96.7EGFP. 

Vectors containing codon optimized R muelleri GFP with a linker-HA tag-iinker sequence inserted Into 
each position A-F were created by PGR ligation (see for example, Norton, R.M. et al. (1989) Gene 77: 
69^77) of two PGR generated fragments, a 5' section fragment and a 3* section fragment The 5* 
section of each construct was PGR amplified with R fonward primer, shown above, and a primer 
corresponding to each Insertion constmct: 
A reverse, 5' - 

CTGGGGTAGTGGGGCAGGTGGTAGGGGTAGGGACGGGCGTGGCCGTCGTAGCGGAGGGTGGG 

GTGGTAG; 

B reverse, 5' - 

GTGGCGTAGTCGGGGAGGTGGTAGGGGTAGCGACGGGCGTGGGGGTGGATGAGGTTGATGTGG 

GTGCGG; 

G reverse, 5'- 

GTGGGGTAGTCGGGGAGGTCGTAGGGGTAGCGAGGGGCGTGGGGGTTGATGTACATGGGGTGG 

AAGGTG; 

D reverse. 5* - 

CTGGGGTAGTCGGGCAGGTGGTAGGGGTAGCGAGGGCCGTGGGCGTTAAGGTTGTACAGGAGG 

ATGACG; 

E reverse, 5* - 

CTGGCGTAGTCGGGGAGGTGGTAGGGGTAGCGAGGGGCGTGGGCGCGGTTGGTGTTCATGAGG 

GTGTTG; 

F reverse, 5' - 

GTGGCGTAGTGGGGCAGGTGGTAGGGGTAGCGAGGGGGGTGGGGGGGGGGGTGGTGGAGGTA 
GGTGTTG. 

SImllariy, the 3' section was generated with R reverse primer and corresponding primers: 
A fonvard, 5* - 

GGTAGGAGGTGGGGGAGTAGGGGAGGGTGGGGGAAGGAGGTGGAGGGGAGGGCGGCGTGGTG 

GAGATGGGGA; 

B fonvard, 5' - 

GGTAGGAGGTGGGGGAGTAGGGGAGGGTGGGGCAAGGAGGTGGAGGGGAGAAGTTGGTGTAGG 
GGGTGGAGT; 
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C forward, 5* - 

CCTAC6ACGTGCCCGACTACGCCAGCCTGGGCCAAGCAGGTGGAGGCAACGGCGTGCTGGTG 

GGCGAGGTGA; 

D forward, 5' - 

CCTACGACGTGCCCGACTACGCCAGCCTGGGCCAAGCAGGTGGAGGCAGCGGCAAGTACTACA 

GCTGCCACA; 

E forward, 5'- 

CCTACGACGTGCCCGACTACGCCAGCCTGGGCCAAGCAGGTGGAGGCGTGGTGAAGGAGTTC 

CCCAGCTACC; 

FforwaiTd,5*- 

CCfACGACGTGCCCGACTACGCCAGCCTGGGCCAAGCAGGTGGAGGCmGTGGAGC^^^ 
GAGACCGCCA, The PGR generated fragments were Inserted Into the EcoRI/Not! sites of 
p96.7EGFP. C-ternilnal flag tags were added to these vectors in the same manner described above. 

The bacterial expression vector for purification of Ptilosarcus GFP was created by PGR amplrfication 
of pP with primers: 
forward, 5' - 

AGATGATAGATGTATGGGGAAGCGGAAGGTGGTGAAGAAGAGGGGGGTG and 
P reverse (shown above), digestion with Bglil/NotI, and ligation onto the BamHi/NotI restriction sites of 
pGEX6P-1 (Pharmacia Biotech, Piscataway, New Jersey). The expressbn vector containing R, 
muelleri GFP was made by PGR ligation of two fragments: a fragment generated by annealing and. 
extending primers rmgGE fonward, 5' - 

AGATGATAGATGTGAATTCATGGGGAGGAAGGAGATGGTGAAGAAGAGGGGCGTGGAGGAGGTG 
ATGAGCTACAAGGTGAACCTGGAGG and nmgGE reverse, 5' - 

GGGGAAGAGGATGTTGCGGTTGGGGTGGGGGTGGATGGTGAACACGTGGTTGTTAAGGATGGCC 
TGCAGGTTGAGGTTGTAGGTCATGAG; and a second fragment generated by PGR of pR with 
primers mngGE forward and R reverse. The composite fragment was amplified with primers G6nmg 
Forward 5'-AGATCATAGATCTGAATTGATGGG and R reverse. The PGR sewed product was . 
digested with Bgill/NotI and iigated onto the BamHI/Nott sites of pGEX6P"1. This vector expresses R 
muelteri GFP with G10G and G35E mutations to aid In the folding of the protein In bacteria. 

Gells and Retrovirus Transduction 

Phoenix retroviral packaging cells, described In Swift, S. et al. (1999) Gument Protocols in Immunology 
(Goligan, J., Krulsbeek, A., i\^arguilles. D.. Shevach, E. and Strober, W., eds). Vol. 10.17G, John Wiley 
and Sons, Inc., New York, pg 1-17, were canled In 10% fetal bovine serum with 1% penicillin- . 
streptomycin and Dulbecco's modified Eagle media (Medlatech Gellgro, Hemdon, VA). Juritat-E cells 
stably expressing the ecotropfc receptor were cultured (n RPM1 1630 nfiedla (JRH Bioscience, 
Williamsburg, VA) supplemented with 10% fetal calf serum plus 1% pentolllln-streptomycin. Calcium 
phosphate transfection of Phoenix ceils and Infectbn of Jurkat-E cells was canried out as described in 
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Swift etaL supra . 
Gel Filtration 

Gel filtration was carried out on a 1 x 30 cm Pharmacia Superdex 75 column, equilibrated in 
phosphate buffiered saline and eluted at 0.3 ml/min. at 22°C. The column was on a Hewlett-Packard 
1 100 HPLC system equipped with a standard fluorescence detector with an dpi flow cell. GFP peaks 
were detected by absorption at 489nm or by fluorescence emission at 51 2nm. Fluorescence 
excitation spectra were recorded with a fixed emission wavelength at 549nm, and emission spectra 
were recorded at a fixed excitation wavelength of 450nm. . 

FACS arid Microscopy 

Flow-cylome'try analysis and cell sorting of GFP expressing cells were performed on a FACScan 
(Beckton*Dickson, San Jose, CA) or MoFlo (Cytomation, Fort Collins, CO) Instrument, and data 
analyzed using FloJo software (Treestar Software, San Carlos, CA). Live cells were gated by scatter 
and propidium iodide staining during data analysis. GFP fluorescence intensity measurements 
(Geometric mean) were of GFP positive cells only. Cells expressing GFP were visualized using Nikon 
Ellipse TE300 fluorescence microscope. 

Immunopredpitqtiop and Western Analysis 

For preparation of whole-cell lysates, cells were counted, collected, washed In PBS, and lysed by 
freeze-thaw/vortexing in lysis buffer (50 mM HEPES pH 7.4, 150 mM NaCI, 5 mM EDTA. 5 mM EGTA, 
1% Triton X-100) with added complete protease Inhibitor cocktail (Boehringer Mannheim, Chicago, IL). 
Lysate cleared by centrlfugatlon was resolved on 4-12% NuPage SDS poiyacryiamide gels (Novex, 
San Diego, CA) as per the manufacturer's recommendations. For immunopreclpltations, antibody 
conjugated agarose beads were added to the cell lysate, incubated for 4 h. The beads were washed 
In lysis buffer and samples separated by SDS PAGE as above. Samples transferred to PVDF 
membranes were blocked overnight at 4*'C using PBS buffer containing 10% Milk, 0.1% Tween20. 
Primary antibodies (polyclonal flag-probe, Santa Cruz Biotechnology, Santa Cruz, CA) were used at a 
1 :2000 dilution while secondary antibodies were used at a 1 :5000 dilution. Membranes were 
developed using ECL plus enhanced chemiluminescence kit (Amersham Pharmacia, Pfscataway, NJ) 
and detected using Hyperfilm ECL film (Amersham Life Sciences, Buckinghamshire, UK). For 
comparative Western blot analysis, GFPs containing a C-temnlnal flag tag were used. Exposed film 
was scanned with a Hewlett Packard (Palo Alto, CA) ScanJet 4C scanner and band Intensities were 
Integrated using the program NIH Image (see http://rsb.info.nfh.gov/nih-i'mage/abouthtmI). 

GFP Purification from E. coll 

All components used for purification of the GFP gene products were from Phanmada Biotech 
(Piscataway, NJ) except as noted. The human codon-optimlzed gene for each protein was expressed 
in BL21 TIL codon plus (DE3) £. co//(Strategene, San Diego, CA) as a fusion protein with glutattiione 



110 



wo 02/090535 



PCTAJS02/14766 



S-transferase from pGEX6p-1 derived vectors. Each protein was purified using Glutathione 
Sepharose 4B beads as per the manufacturer's directions, and the mature GFP was removed from 
the protein with Precision Protease. The purified proteins ran as single ban6s by SDS-PAGE and 
appeared as single pealcs of the expected molecular mass by MALDI-TOF mass spectometry on a 
Brul<er Reflex III Instrument (Bmker Daltonics, Billerica. MA). Due to the clonmg strategy, purified R 
muellerl GFP has the amino acids GPLGSEF- and PUIosarcus GFP the residues GPLGS- fused to 
their N-termlnl. Purified recombinant EGFP was from Clontech (Palo AKo. CA). 

CD Studies 

CD spectra were recorded as described In Guairaja, T.L et al. (2000) Chem Biol 7: 515-27. CD 
s"pectra were recorded on an AVIV 62A DS CD spectrophotometer (Lai^ewood, N.J., USA) equipped 
with a Peltief temperature control unit The temperature of the instrument was maintained constantly 
beiow 20°C using a Neslab CFT-33 refrigerated reclrculator water bath. The device was periodically 
calibrated with the ammonium salt of (+)-10-camphorsulfuric acid according to manufiacturer's 
recommendations. Spectra were recorded between 200 and 250nm at 0.2nm intervals with a time 
constant of 1 s at 25"C in 10 mM phosphate buffer containing lOOmM KF, pH 7.5. A cylindrical quartz 
cell of path length 0.1 cm was used for the spectral range with the sample concentration of 5 to 10 uM 
as determined by Lowry, 0. et al. (1951) J. Biol. Chem. 193: 265-275. Mean residual elllptlclty (MRE) 
is expressed in deg,cm%moL The thennal denaturation was measured at 21 8 nm over a range of 4- 
98X with a temperature step of 2'C, a 2 minute equilibration time, and a 60 s signal averaging time. 
The datas were fitted to a logistic sigmoid equation using the Levenberg-Marquardt algorithm In 
UltfBfit (Biosoft. Cambridge, UK). CD spectra were deconvoluted with the program CDNN (CD neural 
networic) downloaded from httD://bioInfomi atik.blochemtech.unl-halle.de/cdnn/index.htmt. 

EXPERIMENT 2 

EXPRESSION OF RENILLA GFP CODON OPTIMIZED FOR EXPRESSION IN HUMAN CELLS 

Renffia muelleri and PUIosarcus GFP genes were constmcted with a^lydne following the initial 
methionine to optimize translations (see Experiment 1). The sequences were codon optimized for 
efficient expression In human cells. These GFPs were Introduced Into Juri«at-E cells by retroviral 
delivery using the protocol of Swift, et al., supra . Based on FACS analysis of scatter and propldlum 
iodide staining of cell populations from 13 hours to 8 days post Infection, there was no observed 
toxicity of either PUIosarcus or Renllla GFP. By 2 days post Infection, the accumulation of intracellular 
GFP slowed to a steady state level. Based on FL1 channel fluorescence, the rate for reaching the 
steady state level occumed more rapidly for PUIosarcus and Ren///a GFPs than for EGFP. The 
excitation and emission spectra were 501 and 51 1 nm, 498 and 509 nm, and 489 and 510 nm for 
PUIosarcus GFP. Renllla GFP, and Aequoria GFP, respectively. 

The relative levels of wild type and codon optimized Renilla GFP and EGFP were analyzed by FACS 
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at 4 days post Infection. Based on geometric mean fluorescence values in the FL1 channel, codon 
optimized Renllla GFP was expressed greater than 28 fold higher than wild-type cDNA sequence, and 
was 1 .4 fold brighter than EGFP. 

PUIosarcus GFP, Renilla GFP and EGFP fused at their cart)oxy temiini to a lirlker-flag tag sequence, 
EEAAKA-DYKDDDOK. were expressed In Jurkat-E. and their fluorescence levels compared by FACS. 
The Ptilosarcus and Renilla GFPs were on average 1 .4 fold and 1 .2 fold more fluorescent than EGFP, 
respectively. Lysates from 2.8 x 10* Jurkat-E cells, sorted 8 days after Infection for GFP fluorescence, 
were compared by Western blot using ant-flag antibody. All GFPs gave only a single band. EGFP 
migrated at a slightly higher molecular mass than the other two flag-tagged antibodies. The Integrated 
intensity Values derived using NIIH Image were 3200, 3206, and 2314 for each t>and, and had ratk>s of 
1 .4: 1 .4: 1 .0 for Ptilosarcus GFP, Renllla GFP and EGFP, respectively. Thus, both Renilla and 
Ptilosarcus are expressed at slightly higher levels tiian EGFP In these cells, making these codon 
optimized construct efficient reporter proteins. 

EXPERIMENT 2 

EPITOPE TAG INSERTION FOR LOOP SUITABLE FOR PRESENTATION OF PEPTIDES 

To test for the location of potential surface loops In Renilla GFP, tiie peptide sequence 
GQGGG YPYDVPDYASL GQAGGG containing the Influenza hemagglutinin epitope tag (undertined) 
flanked by two flexible linker sequences was inserted into candidate sites corresponding to putative 
loops of Renilla GFP (see Experiment 1). Following retroviral delivery fnto human cells, the 
fluorescence of the modified GFPs were examined. Six different insertion sites, A-F were tested in 
codon optimized GFP. Figure 7 shows the fluorescence of the different modified Renilla GFP 
retnovirally expressed in Juri<at-E cells and analyzed by FACS 4 days post infection. The geometric 
mean fluorescence values for the populations indicated by the gates are shown in the upper right 
comer for each FACS plot. Comparisons of these values are for samples that have populattons 
present within the same dynamic range. All modifled Renllla GFPs, except that with insertion into 
position A were expressed and fluorescent. The rank order of fluorescence Intensities was 
D>F»B>E=C. Relative to the unaltered Renllla GFP, tiie average expression levels of Renllla GFP 
with the HA peptide positions D and F were ca. 49% and 47%, and B, C, and E less ttian 1%. Thus, 
Renllla GFP witii HA tags Inserted Into positions D and F best tolerate insertion of tiie 22 mer peptide. 
Renllla with the position D Insertion was on average 2.3 fold more fluorescent than Aequoria EGFP 
with tiie identical 22 mer present in its most fluorescent loop (Peelle et al. (2001 ) Chem, Biol. 8: 521- 
534). Comparison of insertions of tag peptides between Aequoria and Renllla show most significant 
difi^erence In position F. In EGFP, this analogous site Is a loop between two twisted beta strands with 
a distance across the top of the loop of ca 1 1 A. An 8 mer peptide Inserted into Aequoria GFP at the 
equivalent position F is only 0.6% as fluorescent as the parent GFP when expressed In yeast (Abedi, 
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et al (1998) Nucleic Acids Res. 26, 623-30). whereas insertion of a 22 mer HA tag into position F of 
Renilla GFP retains 32% of the parent fluorescence. Thus, the Renilla stmcture appears to be 
significantly more tolerant than Aequoria GFP to Insertion of peptides Into this particular site. Although 
the position F site is lil<ely to be surface exposed in both GFPs, its structure or significance in the- 
folding pathway of Renilla GFP may differ from Aequoria GFP. 

EXPERIMENTS 

INTRACELLULAR PRESENTATION OF A PEPTIDE ON A RENILLA GFP SCAFFOLD 

To examine Renilla GFP as a peptide display scaffold, the SV40 derived nuclear localization signal 
(NLS) -PPKKKRKV- flanlced by glycine linkers used in the epitope tag scan was inserted into sites D 
and F. This NLS peptide interacts with karyopherins in the nuclear pore complex for transport into the 
nucleus (Radu. etal. (1995) Proc. Natl. Acad. Scl. USA 92: 1769-1773; Rexach, M etal. (1995) Cell 
83: 683-92; Morolanu, J. et al. (1995) Proc. Acad. Sd. USA 92: 2008-1 1 ). About 10« A549 cells with 
retrovirally expressed Renilla site D or F inserted peptide were grown for 14 days and then observed 
by fluorescence microscopy. The HA epitope tag flanked by 4 glycines, G4YPYDVPDYASLG4- was 
inserted along with the linker residues as a control for each experiment GFP with this tag Inserted in 
both site D and F fluoresced throughout the cell, while the NLS containing insert showed only nuclear 
fluorescence, with some preferential localization to intra-nuclear structures for the loop D Insert, the 
Inserted peptide Is thus solvent exposed and can functionally Interact with its target In the cell. Thus, 
the use of Renilla GFP as a scaffold allows use of additional GFP peptide display site, with possibly a 
different structural bias for phenotypic screening of peptide libraries. 
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CLAIMS 

We claim: 

1 . A retroviral vector comprising a fusion nucleic add comprising: 

a) a promoter; and 

b) Renilla GFP gene. 

2. A retroviral vector comprising a fusion nucleic acid comprising: 
a) a promoter; and 

_ t) Ptilosarcus GFP gene. 

3. A nucleic acid vector comprising a fusion nucleic acid comprising: 

a) a promoter; 

b) Renilla GFP gene; 

c) a separation site; and 

d) a gene of Interest 

4. A nucleic add vector comprising a fusion nucleic acid comprising: 

a) a promoter; 

b) Ptilosarcus GFP gene; 

c) a separation site; and 

d) a gene of interest 

5. A vector according to daim 3 or 4, wherein said separation site is an IRES element. 

6. A vector according to daim 3 or 4, wherein said separation site Is a Type 2A sequence. 

7. A vedor according to daim 3 or 4, wherein said separation site Is a protease recognition site. 

8. A vector according to daim 3 or 4, wherein said gene of interest comprises a reporter gene. 

9. A vedor according to claim 3 or 4, wherein said gene of Interest comprises a selection gene. 

1 0. A vedor according to daim 3 or 4, wherein said gene of interest comprises a nucleic add 
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encoding a dominant effect protein. 

11. A vector according to claim 3 or 4, wherein said gene of interest comprises a cDNA. 

12. A vector according to claim 1 1 , wherein said cDNA comprises a cDI^ fragment 

13. A vector according to claim 3 or 4, wherein said gene of interest comprises a genomic DNA 
fragment. 

14. A vector according to claim 3 or 4, wherein said gene of interest comprises a random 
peptide. 

15. A vector according to claim 14. wherein said random peptide is biased. 

16. A vector according to claim 3 or 4 comprising a retroviral vector. 

17. A fusion nudeic acid according to claim 1 or 3, wherein said GFP is codon optimized Renllla 
muellerl GFP. 

18. A fusion nudeic add according to daim 2 or 4, wherein said GFP is a codon optimized 
Pt'losarcus GFP. 

19. A fusion nucleic add according to claim 17, wherein said codon optimized Renllla muelleri 
GFP is codon optimized for expression in human cells. 

20. A fusion nudeic add according to claim 1 9. wherein said codon optimized Renilla muelleri 
GFP comprises SEQ ID NO: 1. 

21 . A fusion nudeic add according to claim 18, wherein said codon optimized Ptilosarcus GFP is 
codon optimized for expression in human cells. 

22. A fusion nudeic add according to daim 21 , wherein said codon optimized Ptilosarcus GFP 
comprises SEQ ID NO: 2. 
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23. 



A fusion nucleic acid comprising: 

a) a gene of interest; and 

b) a gene encoding codon optimized Renllla muelierf GFP 



24. 



A fusion nucleic acid comprising: 



a) a gene of Interest; and 

b) a gene encoding codon optimized PUtosarcus GFP 



25. A fusion nucleic acid according to cialm 23 or 24, wherein said gene of Interest comprises a 
cDNA. 

26. A fusion nucleic acid according to claim 25. wherein said cDNA comprises a cDNA fragment. 

27. A fusion nucleic acid according to claim 23 or 24, wherein said gene of interest comprises a 
genomic DNA fragment 

28. A fusion nucleic acid according to claim 23 or 24, wherein said gene of interest comprises a 
nucleic acid encoding a random peptide. 

29. A fusion nucleic according to claim 28, wherein said random peptide is biased. 

30. A fusion nucleic acid according to claim 23 or 24, wherein said GFP Is codon optimized for 
expression In human cells. 

31 . A fusion nucleic acid according to claim 30, wherein said codon optimized Renilla muelleri 
GFP comprises SEQ ID NO: 1. 

32. A fusion nucleic acid according to claim 30, wherein said codon optimized Ptitosarcus GFP 
comprises SEQ ID NO: 2. 

33. A library of fusion nucleic acids each comprising a fusion nucleic add according to claim 25, 
26,27, 28, or 29. 
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34. A library of cells comprising a library of fusion nucleic acids according to claim 30. 

35. A cell comprising the fusion nucleic acid of claim 25, 26. 27, 28 or 29. 

36. A method of screening for bioacUve agents, said method comprising: 

a) combining a candidate bloactive agent and a cell comprising a fusion nucleic acid 
comprising 

— - 1) a promoter 

ii) a codon optimized Renilla muelleri GFP; 

b) screening said cells for an altered phenotype. 

37. A method of screening for bloactive agents, said method comprising 

a) combining a candidate bloactive agent and a cell comprising a fusion nucleic acid 
comprising 

i) a promoter 

Ii) a codon optimized PUIosarcus GFP; 

b) screening said cells for an altered phenotype. 

38. A method of screening for bloactive agents according to claim 36, said fusion nucleic add 
comprising: 

a) said promoter; 

b) said codon optimized Renilla muelleri GFP; 

c) a separation sequence; and 

d) a gene of interest. 

39. A method of screening for bioactive agents according to claim 37, said fusion nucleic acid 
comprising: 

a) said promoter; 

b) said codon optimized Ptilosarcus GFP; 

c) a separation sequence; and 

d) a gene of Interest. 
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40. 
41. 
42. 
43. 

44. 

46. 
46. 



A method according to claim 38 or 39, wherein said gene of Interest comprises a reporter 
gene. 

A method according to claim 38 or 39. wherein said gene of Interest comprises a selection 
gene. 

A method according to claim 38 or 39, wherein said gene of Interest comprises a nucleic add 
encoding a dominant effect protein. 

A method according to claim 36 or 38 wherein said promoter comprises an IL-4 indudble e 
promoter and said method further comprising: 

a) inducing said promoter with IL-4; and 

b) detecting said altered phenotype comprising ai>sence or presence of expression of said 
codon optimized ReniUa muelleri GRP. 



A method according to dalm 37 or 39 wherein said promoter comprises an IL-4 indudble e 
promoter and said method further comprising: 

a) Inducing said promoter with IL-4; and 

b) detecting said altered phenotype comprising absence or presence of expression of said 
codon optimized Ptilosarcus GFP. 

A method according to claim 36. 37, 38, 39, 43. or 44 further comprising c) isolating said cell. 

A method according to dalm 45 further comprising d) identifying the candidate agent 
responsible for said altered phenotype. 
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1 15 30 

ZFPS MAQSKHGLTKEM TMKYRMEGCVDGHKF 

ANEM MALSNKFIGDDM KMTYHMDGCVNGHYF 

DSFP MSCSKSVIKEEM LIDLHLEGTFNGHYF 

FP48 IPKALTTMGVIKPDM KIKLKMEGNVNGHAF 
PTIL MNRNVLKNTGLKEIM SAKASVEGIV1©JHVF 
RENM MSKQILKNTCLQEVM SYKVNLEGIVNNHVF 

AEQV MSKGEELPTGW PILVELDGDVNGQKF 

CONS ~-M I~^M ~K— -EG-VNGH-F 



45 60 
VITGEGIGYPFKGKQ —AINLCWBGGPLP 
TVKGEGNGKPYEGTQ TSTFKVTMANGGPLA 
EIKGKGKGQPNBGTN —TVTLEVTKGGPLP 
VIEGEGEGKPYDGTH — TLNLEVKEGAPLP 
SMEGFGKGNVLFGNQ — LMQIRVTKGGPLP 
TMEGCGKGNILFGNQ —LVQIRVTKGAPLP 
SVSGEGEGDATYGKL — TLKFICTTG-KLP 
GEG-G-P~G-Q — T VT-GGPLP 



_ - 75 90 105 A 120 
ZFPS FAEVlhSAAFJ^ty0m VFTEYPQ--DIVDYF KNSCPAGYTWDRSFL FEDGAVCICNADITV 
ANEM FSFDILSTVFJCy^m CFTAYPT—SMPDYF KQAFPDOOSYERTFT YEDGG—VATASWEI 
DSFP FGWHILCPQFQXfi^NK AFVHHPD— NIHDYL KLSFPEGYTWERSMH FEDGG—LCCITNDI 
FP48 FSYDILSNAFQy^NR ALTKYPD— DIADYF KQSFPEGYSWERTMT FEDKG--IVKVKSDI 
PTIL FAFDIVSIAFQySNR TFTKYPD— DIADYF VQSFPAGFFYERNLR FEDGA—IVDIRSDI 
RENM FAFDIVSPAFQyeNR TFTKYPN— DISDYF IQSFPAGFMYERTLR ^Q)GG —LVEIRSDI 
AEQV VPWPTLVTTF^yGVQ CFSR YPDHMKQ HDFF KSAMPEGYVQERTIF YKDDG — NYRTRAEV 
CONS F—DILS-AFQ/SNR -FTKYPD— DI-DYF KQSFP-GY~ERT~ FEDGG 1 

B 135 150 0165 180 

ZFPS SVEBNCMYHESKFYG VNFPADG-PVMKKMT DNWEPSCEKIIPVPK QGILKGDVSMYLLLK 

ANEM SLKGNCFEHKSTFHG VNFPADG-PVMAKKT TGWDPSFEKMTVCD GILKGDVTAFLMLQ 

DSFP SLTGNCFYYDIKFTG IiNFPPNG-PWQKKT TGWEPSTERLYPRD- -GVLIGDIHHALTVE 
FP48 SMEEDSFIYEIRFDG MNFPPNG-PVMQKKT LKWEPSTEIMYVRD- -GVLVGDISHSLLLE 

PTIL SLEDDKFHYKVEYRG NGFPSNG-PVMQKAI LGMEPSFEWYMNS GVLVGEVDLVYKLE 

RENM NLIBPKFVYRVEYKG SNFPDDG-PVMQKTI LGIEPSFEAMYMlW GVLVGEVILVYKLII 

AEQV KF EGDTL VNRIELKG IDFKEDGNILGHKME YNYNSHNVYIMADXg raJGIKVNFKIRHNIE 
CONS SLE-D-F-Y F-G -NFP-DG-PVMQK-T -GWEPS-E—Y -GVL-GDV L-L- 



D 195 B 210 225F 240 
ZFP5 DG-GRLRCQFDTVYK AKSVPRKMPDWHFIQ HKLTREDRSDAKNQK WHLTEHAIASG-SAL 
ANEM GG-<amiCQFHTSYK TK-KPVTMPPNHWE HRI ARTDLDKGGNS - VQLTEHAVAHITSW 
DSFP GG-GHYACDIKTVYR AKK7VALKMPGYHYVD TKLVIWNNDKEFMK- VEEHEIAVARHHPFY 
FP48 GG-GHYRCDFKSIYK AKK-WKLPDYHFVD HRIEILNHDKDYNK- VTLYENAVARYSLLP 
PTIL SG-NYYSCHMKTFYR SKGGVKEFPEYHFIH HRLEKTYVEEG— SF VEQHETAIAQLTTIG 
RENM SG-KYYSCHMKTLMK SKOVVKEFPSYHFIQ HRLEKTYVEDG~0P VEQHETAIAQMTSIG 
AEQV BGSVQLADHYQQNTP IGD&PV LLPDNHYLS TQSALS KDPNEKRDH MILLEFVTAAGITHG 
CONS -G-G-Y-C--KT-Y K P-YHF— HRL V-L-E-A-A 



ZFP5 P 231 

ANEM PF 229 

DSFP EPKKDK 232 

FP48 SQA 231* 

PTIL KPLGSLHEWV 238 
RENM KPLGSLHEWV 238 

AEQV MDELYK 238 

CONS 



wo 02/090535 



2/9 



PCT/US02/14766 



FIGURE 2 



ATGGGCAGCAAGCUUSATCCTGJUWaAACACC^ ( co ) 

ATG AGTAAACAAATATTGAAGnACACTlXyiTmaUlGAAGTAATGTCGTATA (wt ) 

. AACCtGGaiGGGCATCGTTAACAACCAOGTGTTCACCATGG 
— -AATCTGGAAGGAATTQTAAACAACCaVTGTTTTTACAATGGAGGG^^ 

ATCGTOTTCGGaUVCCAATTGGTGO^TCXX^ 
ATTTTATTaK3CAATC»ACrroGTTCAGATT0STGTCA 

GCCITCGACATCGTGAGCCCCGCCTTCCAGTACGGCAACCGTACXSTT^ 
GCATTTGATATTGTGTCACCAGCTTTTCAATATGGCAACCGTACTTTCATO 

AAOSACATCAGCGACrACTrCATCCMAGCrTCCCCGCCGC^^ 
AATGATATATCAGATTATTTTATACMTCATTTCCAGCAGGATT^ 

CTGOGCTACGAGGACXSGCGGCCTGGTGGAGATCCGCAGCGACATC^ 
TTAaSTTAOSAAGATGGaSGACTTGTTGAAATTCXSTTCAGATATAi^ 

AAC5TTCX3TGTACCGCX5TGGAGTACZ^GGCAGCAACITCCCCG 
AAGTTOGTCTACAGAGTGGAATACAAAGGTAQTAACrrTCCC^ 

CAGAAGACCATCCTGGGCATCGAGCCCAGCTTOSAGGCCATGTAC^^ 
CAQAAGACTATCTTAGGAATAGAGCCTTCATTTGAAGCCATGTACATGAATA^ 

CIXSGTGGGCGAGGTGATCCTGGTGTACAAGCrTAACAGCGGC^ 
TTGGTCX^GOSAAGTAATTCnTOTCTATAAACTAAAC^^ 

ATGAASACCCTGATGAAGAGCAAGGGOGTGGTGJUiGGAGTTCCC^^ 
ATGAAAACATTAATGAAGTOSAAAESGTGTAGTAAAGQAGTTTCCTTCGTATa 

CAGCACCGCCTCGAGAAGACCTACGTGGAGOACGGCXSGCTTCXSTGGM 
CAACATaSTTTGGAAAAGACTTACGTAGAAGAOGGGGGGTTaSTT^ 

GCCATCX3CCCAGAT6ACCAGCATCGGaUVSa;CCTGGaATCa:TGC^^ 
GCTATTGCTCAAATGACATCTATAGGAAAACXaVCTAGGATCCriTACAO^ 
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FIGURE 3 



ATGGGCAACaSCAACGTGCKSAAGAACACCGGCCTGAAGGAGATC^ (co ) 

ATG AACCGCAACGTATTAAAGAACACTGGACTGAAAGAGATTATGTCGGCAAA^ (wt ) 

*AGCX3TGGAGGGCATCGTGAACAACCyvCGTGTTCAGCATGGAGGGCTTC^ 
AGCXSTTGAAGGAATCXSTOAAaVATCACGTTlTTTCCAT^ 

GTGCrGTTCGGa^CCAGCTGATGCAGATCCGCX5TGACC»A 
GTATTATTTGGAAACCAATTGATGCAAATCCGGGTTACA/UIGGGAGGTCC^ 

GCCrrCGAOlTCGTGAGCSVTCXSCCTTCCAGTAaSGCa^ 
GCTTTCGACATTGTTTCCATAGCTTTCCAATACQGGAATC 

GACGACATOSCCGACTACTTCGTGCAGAGCTTCCCCGCCGGCTTCIT^ 
GACXSACATTGCGGACTACrrrGTTCTUVTaiTTTCO^ 

CTGCGCTTCGAGGACGGCXSCCATCGTGGACATCCGaiGCXSAC^^ 
CTACX3CTTTGAAGATGGCX3CX:ATTGTTGACy^TTa3OT 

AAGTTCCACTAaUlGGTGGAGTACaK^GGCAACGGCTTCCCCAGa^ 
AAGTTCCACTACAAAGTGGAGTATAGAGGaACX^GTTTCCCTAGTAAC^^ 

O^GAAGGOaVTCCTClSGCATGGRGCCCAGCaTOGAGGTGGTGTAC^ 
CAAAAAGCttTCCTCGGCAOraQAGCCATCGTTTGAGGTGGTCTAa 

CTGGTGGGOSAGGTGGACCTGGTGTACSUlGCTGGAGAGaSGaACT 
CTOGTGGGCGAAGTAGATCTOGTTTACS^CraSAG 

ATGAAGACCnTCTACCGCAGCAAGGGCGGCGTGAAGGAGTT(X:CCGAGTACC^ 
ATGAAAACGTTTTACAGATCCAAAGGTGGAGTGAAAGAATTCCCGGAATATCACnT^ 

CACCACa3CCTGGAGAAGACCTACX5TGGAGGAGGGCAGCTTCX3TGGAGCAGCAC^^ 
CATCATCGTCTGGAGAAAACCTACGTCGAAGAAGGAAGCOTCGTGGAAC^ 

GCCATCGCCCAGCTGACCACCATCX3GCAAGCCCCTGGGCAGCCTGCACGAGTGGGTG 
GCCATTGCACAACTGACCACAATTGGAAAACXrrCTGGGCTCCCrrTC^ 
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AYHADYYKQR-IH-V-M 
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